THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Eric Johnson

My Thoughts on SQL Server and Database Related Technologies.

SSIS 2008: Data Profiling Task

SQL Server 2008 Integration Services shipped with the wonderful new Data Profiling Task. This task, much as the name implies, will profile the data in a given table and return a ton of useful information. The task gathers statistics for candidate keys, column length, NULL data ratio, data patterns, and much more. All of the results are written out to XML. The task makes more sense when you look at it, so let’s jump in. The configuration of the task is pretty simple, you select the types of profiles that you want to run and then set the specific options for that profile. In the figure below, I have selected Column Length Distribution for all columns (*) in the HumanResources.Employee table. This profile will gather length statistics on the columns such as Minimum and Maximum Length and provide a distribution of all of the lengths found.  On the general page of the task, you also need to provide a destination, which can be a file connection or a variable.

image

Once you set up the task with all the data profiles you want to run, you simply execute the SSIS package and let it work it’s magic. If you wrote to a variable, you’re on you own to do something with the data within the SSIS package. On the other hand, if your wrote to a file, the SQL Server 2008 Client Tools come complete with the Data Profile Viewer which can be found at in your program menu under Microsoft SQL Server 2008 | Integration Services. This is a pretty simple application that allows you to browse the XML file created by the Data Profiling Task. The figure below shows the results for the Column Length Distribution profile I ran earlier. As you can see, there is quite a bit of information returned, and this is just one of the profiles. If you take some time and dig into all the profile types, I think you will be very impressed with the amount of information this task provides. If you’re the creative type, you can run this task and write the data to a variable which can be further queried by your SSIS package. With a little work, you could have a great tool at your disposal the next time your presented with a new database that you nothing about.

image

Published Friday, September 11, 2009 4:30 PM by ejohnson2010

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

John Paul Cook said:

When reviewing a client's database, I suspected there might be indexes on low cardinality (i.e., low

November 20, 2009 3:30 PM
 

mbalajiwale said:

Is there any way to create user defined business rules apart from the inbuilts in SSIS?

November 1, 2011 5:56 AM
 

Sherri Wright said:

SQL server services are useful in providing good performance business sites, and by sharing this one you could really encourage readers like me to prefer this kid of web tool.

January 24, 2012 12:37 AM

Leave a Comment

(required) 
(required) 
Submit

About ejohnson2010

Eric (Microsoft SQL MVP) is the co-founder of Consortio Services, and the primary Database Technologies Consultant. His background in Information Technology is diverse, ranging from operating systems and hardware to specialized applications and development. He has even done his fair share of work on networks. Since IT is really just a way to support business processes, he has also acquired his MBA. All in all, he has 10 years of experience with IT, a great amount of which has been working with Microsoft SQL Server. Eric has managed and designed databases of all shapes and sizes. He has delivered numerous SQL Server training classes and webcasts as well as presentations at national technology conferences. Most recently, he presented at TechMentor on SQL Server 2005 Replication, Reporting Services, and Integration Services. In addition, he is active in the local SQL Server Community, serving as the President of the Colorado Springs SQL Server Users Group.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement