06.04.2016

Turbocharge your customer data with Hadoop!

A holistic view of your customer behavior is something that all companies strive for. The 360-degree customer view, although sometimes considered unattainable, is something that can be achieved with the right platform.

Whether you run a webshop or an old-fashioned brick-and-mortar business, it’s paramount that you’re able to easily use all the data you collect from your customers effectively.
This includes a plethora of different sources, ranging from generic row-and-column databases to more complex, unstructured social media data — basically, everything we have come to call ‘Big Data’.

One datasource that sits in the middle of this spectrum in terms of complexity is customer satisfaction and survey data. However the challenge with this type of data is that in many cases there’s a serious problem with data quality. Survey data is produced by third party companies and in many cases is manually created from the results. This produces a large volume of data files which can be a problem for traditional batch process BI systems or databases.

With the power of Hadoop, we can simplify this a whole lot!

In order to make sense of the survey data, we need a place to save and process it. In a traditional BI system, we would have to set up a job to pull these files from a file server. We also want to be able to give our partners (who collect the data) the ability to log into an interface, and upload the survey results data instead of emailing it.

Here’s a high level picture of the architecture. In this example, we’re concentrating on the text and survey data, but in a real-life example we would also be interested in social media and clickstream data in order to fully understand how our customers are behaving.

Hadoop-Customer360 - Hadoop

Here’s where Hadoop’s file system (HDFS) comes into its own. As HDFS is a schemaless distributed file system, we don’t have to worry about definitions when loading data. It’s similarly to that of a normal computer’s file system, where the files are stored in a folder structure.

Introducing the Hadoop User Experience!

Leveraging the Hadoop User Experience, or HUE for short, we are able to give our partners access to our Hadoop cluster in the form of an intuitive user interface. These partners will be able to log into HUE and upload their data, without the need for scripting. This eliminates the need to involve IT, and allows the end users to upload multiple files (even compressed files) into the system, speeding up the acquisition of data.

After a partner has loaded the data into HUE, we can map the data to form a view, which we can then use later. Mapping the data essentially means defining what the files include and building external views which we can connect to from a 3rd party application — like for example Tableau.

hadoop-hue-dataviz - hadoop-hue-dataviz (1)

As new data is introduced into the system, the sources (views) are updated on-the-fly, as we are reading folders instead of individual files. Very clever indeed.

Setting up a Big Data ecosystem like Hadoop with HUE doesn’t have to be difficult. And with cloud platforms like Microsoft Azure it’s even easier than before!

If you want hear more about HDP Hadoop, Modern Data Architecture and Azure Marketplace, register at Bilot’s breakfast seminar at Microsoft House Espoo 26.4: Hadoop ja Azure Marketplace – digitalisaation tekijät

See you there!

Share
Contact Person

Blog writer

Karri Linnoinen

Bilot Alumni