THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jorg Klein

Microsoft Data Platform MVP from the Netherlands

Relational Data Lake

You can read this blog post at this link: https://jorgklein.com/2014/12/02/relational-data-lake

This blog has moved to www.jorgklein.com There will be no further posts on sqlblog.com. Please update your feeds accordingly.

You can follow me on twitter: http://twitter.com/jorg__klein

Published Thursday, December 18, 2014 4:44 PM by jorg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

TonyG said:

Hi Jorg,

Very interesting and useful post.  Just some questions on architecture...

Do you see Microsoft's APS appliance positioning itself to become the data lake?  Also, for a traditional on prem "Microsoft shop" environment just starting to get more exposure to unstructured data, are there ways you can easily setup, develop & maintain a data lake without purchasing APS?

Any suggestions on architecture would be appreciated.

Thanks,

Tony

December 22, 2014 4:01 AM
 

Robert Bakker said:

Dear Tony,

The Analytics Platform System (hardware, software and services) is very complete in the sense that it can host loads of structured and unstructured (Hadoop) data. The biggest advantage is its performance.

It has some functional limitations with respect to entering and retrieving data, so that it's characteristics ressemble those of a traditional data-warehouse (not real-time).

I would suggest to implement a data-lake with not (just) APS.

As Jorg points out, the solution may exist of more than one underlying platform.

And yes, if you want you can stay with Microsoft technology. HDInsight is available in Azure, so easily 'implemented' and suitable for hybrid (cloud / on-premise) usage.

Robert

December 22, 2014 11:17 AM
 

TonyG said:

Hi Robert,

Thanks for your comments.  I think I'll start with HDInsight as you suggested as it is a cheap and easy way for me to start bringing data in.

In your experience are there certain non T-SQL languages that are required to load and retrieve data from HDInsight?

Thanks,

Tony

December 22, 2014 4:52 PM
 

Jimmy said:

Hey Jorg,

The state of BI in our organization really consist of what you call a Data Lake.  We are currently investigating what our next steps should be.

What do you consider as off the shelf solutions for defining and exposing master data?  Something like Boomi or SAS Master Data Management? Or is it as simple as exposing data through oData?  

I would love to read up more in regards to how master data works with the Data Lake if you have any additional material.

Thank you

Jimmy

December 22, 2014 4:54 PM
 

jorg said:

@Tony, using HDInsight gives you the possibility to issue Hive and Pig queries. The big advantage of these queries is that they are SQL-like and therefore easy to understand, you don’t have to write any Java code.

A nice introduction to HDInsight can be found here:

http://msdn.microsoft.com/en-us/magazine/dn385705.aspx

The following tutorial is a good starting point for learning Hive with HDInsight:

http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-hive/?fb=nl-nl

@Jimmy, I would recommend you to go into a software package selection process to pick one of the off the shelf solutions that meets the requirements of your organization. As I wrote, the master data solution and data store/hub isn’t part of BI solutions or the Data Lake, it’s only used as data source for these solutions. Master data management and the implementation of master data solutions can be a very difficult and time consuming process, but offering master data structures so they can be combined with data from the Data Lake is a must have. Therefore you could start with just gathering, storing and offering master data without going into the master data management process. This could be something that comes in a later moment in time.

December 23, 2014 3:59 AM
 

TonyG said:

Thanks for the links Jorg!

December 23, 2014 5:49 AM
 

Joel Mamedov said:

Is this blog still active?

July 8, 2016 1:35 PM
 

jorg said:

Hi Joel Mamedov, yes it is!

September 8, 2016 5:08 AM

Leave a Comment

(required) 
(required) 
Submit

About jorg

Jorg Klein, Microsoft Data Platform MVP from the Netherlands.
Privacy Statement