Analyzing StackExchange data with Azure Data Lake - Introduction

A few weeks ago I wrote about what Microsoft announced on Azure Data Lake and what you can expect.

As of Wednesday 28th of October, Azure Data Lake Store & Analytics are now in public preview allowing you to try it out yourself. You won't have to worry about any clusters and allows us to focus on our business logic!

To celebrate this, I'm writing a series that will take you through the process of storing the data in Data Lake Store, processing it with Data Lake Analytics and visualizing the gained knowledge in Power BI.

Visualization of the gained knowledge

I will break-up the series into four major parts :

  1. Storing the data in Azure Data Lake Store or Azure Storage (link)
  2. Aggregating the data with Azure Data Lake Analytics
  3. Analyzing the data with Azure Data Lake Analytics
  4. Visualizing the data with Power BI

During this series we will use open-source data from StackExchange.
This allows us to deal with real-world data and how that might cause some difficulties.

In my next post I'll walk you through the steps to upload the data and how we can do this in a cost-efficient way.

Thanks for reading,

Tom Kerkhove.