THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Andy Leonard

Andy Leonard is a Data Philosopher at Enterprise Data & Analytics, an SSIS Trainer, Consultant, and developer; a Business Intelligence Markup Language (Biml) developer and BimlHero; SQL Server database and data warehouse developer, community mentor, engineer, and farmer. He is a co-author of SQL Server Integration Services Design Patterns, and author of Managing Geeks - A Journey of Leading by Doing, and the Stairway to Integration Services.

Data Integration is the Foundation

Unless you live under a rock, you’ve seen the buzz about Data Lakes, Big Data, Data Mining, Cloud-tech, and Machine Learning. I watch and read reports from two perspectives: technical and as a consultant.

As a Consultant

If you watch CNBC, you won’t hear discussions about ETL Incremental Load or Slowly Changing Dimensions Design Patterns. You will hear them using words like “cloud” and “big data,” though. That means people who watch and respect the people on CNBC are going to hire consultants who are knowledgeable about cloud technology and Big Data.

As an Engineer

I started working with computers in 1975. Since that time, I believe I’ve witnessed about one major paradigm shift per decade. I believe I am now witnessing two at the same time: 1) A revolution in Machine Learning and all the things it touches (which includes Big Data and Data Lakes); and 2) the Cloud. These two are combining in some very interesting ways. Data Lakes and Big Data appliances and systems are the sources for many systems, Machine Learning and Data Mining solutions are but a couple of their consumers. At the same time, much of this technology and storage is either migrating to the Cloud, or is being built there (and in some cases, only there). But all of this awesome technology depends on something…

Data

In order for Machine Learning or Data Mining to work, there has to be data in the Data Lake or in the Big Data appliance or system. Without data, the Data Lake is dry. Without data, there’s no “Big” in Big Data. How do these solutions acquire data?

It Depends

Some of these new systems have access to data locally. But many of them – most, if I may be so bold – require data to be rounded up from myriad sources. Hence my claim that data integration is the foundation for these new solutions.

What is Data Integration and Why is it Important?

Data integration is the collection of data from myriad, disparate sources into a single (or minimal number of) repository (repositories). It’s “shipping” the data from where it is to someplace “nearer.” Why is this important? Internet connection speeds are awesome these days. I have – literally – 20,000 times more bandwidth than when I first connected to the internet. But modern internet connection speeds are hundreds-to-millions times slower than networks running inside data centers. Computing power – measured in cycles or flops per second – is certainly required to perform today’s magic with Machine Learning. But if the servers must wait hours (or longer) for data – instead of milliseconds? The magic happens in slow-motion. In slow-motion, magic doesn’t look awesome at all.

Trust me, speed matters.

Data integration is the foundation on which most of these systems depend. Some important questions to consider:

  • Are you getting the most out of your enterprise data integration?
  • Could your enterprise benefit from faster access to data – perhaps even near real-time business intelligence?
  • How can you improve your enterprise data integration solutions?

:{>

Learn more:

Enterprise Data & Analytics
Stairway to Integration Services
IESSIS1: Immersion Event on Learning SQL Server Integration Services
EnterpriseDNA Training

Published Friday, January 29, 2016 9:42 AM by andyleonard

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

RichB said:

Good point, well made.

Trick is how one gets that across to the phb without sounding reactionary or defeatist.  (Yeah, I'm reading Stalingrad, finally... so many parallels to data management)

January 31, 2016 7:47 PM
 

andyleonard said:

Good point, RichB,

  Which Stalingrad are you reading?

:{>

January 31, 2016 8:04 PM
 

RichB said:

Antony Beevors rather large one.  I'd read Harrisons 900 Days - about the siege of Leningrad some years ago, and they both really make one think.  

Puts a little perspective into your day.

February 1, 2016 7:43 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

News


My Companies



Community Awards

Friend of Red Gate

Contact Me

Archives

Privacy Statement