Current status: Small data

  • Reto Hubmann

A bunch of Liipers (me, Andreas Amsler, Stefan Oderbolz, Rachel Knowler and Gerhard Andrey) have been attending different days of this year's Swiss Data Week. It could also have been called “Swiss Big Data Week” because that was the main topic.

We were a bit surprised to find ourselves among “suits” (we were dressed casually). The talks reflected that: There was a lot of marketing. Companies seized the opportunity to showcase their soft- or hardware solutions.

Many speakers introduced us to Big Data . It was not so much fun to hear the same thing over and over again but at least we've got it hammered into our heads so much that we were beginning to think what it really means for everybody out there.

Big Data: The big picture

Big Data on one hand is a magic word to conjure up opportunities, publicity and Big Business. Frankly, we've had enough of the marketing “blah”.

On the other hand Big Data represents a shift in the way companies deal with data. For many years data has been piling up inside companies, but it was contained in silos. The idea that value could be added by correlating different data sources was not introduced until the big online players (think Google, Amazon etc.) were starting to gain insights – and make profits – by combining internal and external data.

It is true that there are opportunities for many companies, maybe especially for the bigger ones. “Big Data” consultants and solution providers take hold of the fear many companies might have about getting behind the competition if they don't jump onto the “Big Data” train.

But the concept behind Big Data (not the technology) is equally useful for any data. It doesn't have to be petabytes that you juggle around. Making better use of the available data is interesting for every company.

Why did many companies not get aware of this potential earlier? It has a lot to do with the traditional organizational structure. Collected data is usually tied to departments (IT, Finance, HR etc.) and these departments only rarely work together because they serve their individual purposes.

So the mission of the future is to break up these barriers and get the departments to share their data to help their business as a whole. How would that look like?

Small data

Say if you would not only store your e-mails but also bring them into correlation with your business. So your customer CompanyA is very important. You could do sentiment analysis of the e-mails and find out that with CompanyA the correspondence is mostly tense and unproductive, while the analysis of the communication with your CompanyB shows a positive atmosphere.

You could find out that you invest a lot more time into e-mails with CompanyA, but the return of investment is lower.

Gaining that sort of insight could help you to decide where you should invest in the future.

Or you could even detect sentiment changes in the e-mails from CompanyA while you are working on a project with them. So before a matter escalates you could already see a red light blinking on your dashboard and you can intervene before it gets ugly.

Or your email traffic analysis could indicate that you have not contacted CompanyC for two weeks and remind you that it would be time now to act upon that.

Small data has been coined to describe this kind of analysis:

“Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks.“

Small data also means:

“Know more, know earlier.”

There are almost no limits in that field. It is often not clear from the beginning what could be gained from the existing data. You have to ask new questions and look at your business in a different way than you might have done before.

Privacy

Because almost everything that happens in a company today is somehow tracked by some computer system you only need to know how to tap into these data streams and then can find out what is really going on instead of just relying on what people may want you to believe:

How long do the employees really work?
How does the social network inside the company look like?
Do some people write a lot of negative emails?
Which employees are likely to resign in the next 3 months?
Who is really helping your company and who is just pretending?
Are your employees criticizing the company or management more than last year?
What would be a good way to make sure you will get promoted?
In what way could a certain person be discredited so that people do not follow their advice?

Do these question make you uncomfortable? They should. All these questions could potentially be answered by combining and analyzing different data (e-mail, chat, coffee machines, copiers, fax, lunch cards, finance, cameras, HR, events, social media etc.)

You see the “problem” with this approach. And that's where we get into privacy concerns and ethics. First we may want to know what the Swiss privacy law rules. For our use-case probably the best source of information is called “Leitfaden über Internet- und E-Mailüberwachung” (not available in English) written by the Federal Data Protection and Information Commissioner (FDPIC).

From this document I quote section 7.2 (translated from German):

“As mentioned before, raw data (log files) have a direct reference to a person. For a person-based evaluation, direct references to individuals have to be prevented by assigning pseudonyms. For example, such an evaluation could answer the question: Are there employees in some department that send more than 100 emails per week? The employees that meet this criterion are to be listed with pseudonyms. This sort of evaluation can systematically be made without having to have any concrete suspicion of abuse.”

In the document this is called pseudonymous evaluation. It is permitted to answer all of these questions by the law and to use the results as one pleases. It is not permitted to use real-names in any such reports unless there is a concrete suspicion of abuse.

In the need to stay competitive, companies will eventually start to do this kind of analysis more and more. If you see companies doing it by using real-names, then you should not tolerate it but take the necessary steps to stop the practice. Personally I am not in favor of doing it with pseudonyms either.

Getting ready for action

Now if you think that Small data could help your company to get better at what you do, we suggest you to take the following steps:

1. Data inventory

The first step is to get an overview of your data. List the available data sources, possibly with short data samples that help with the assessment in the next step.

2. Data assessment

The next step is to assess the available data sources and identity correlations that could provide additional value for your business. You might be able to ask questions that you have never considered before because the data to answer them was not in sight.

Do not limit your assessment to internal data only, but also think about social media or other “public” data sources that might be related to your business.

3. Analytics

Once you have made up your mind up about what you can gain from what datasets and how it needs to be correlated, the data needs to be aggregated, prepared and analyzed. Depending on the field you are working in, you may need to look into Big Data technologies, involve data scientists (for cases with advanced analytics) or just aggregate and unify the data.

A possible milestone is a dashboard or visualisation that presents the insights from the analysis in an easily understandable way, preferably updated in real-time.

Summing it up

Big Data is not for everyone but the underlying concept that made it a hot topic can be very useful for any company that wants to be prepared for the future. We're looking forward to explore the topic both within our own company and with our clients.


Sag uns was du denkst