SQL Server 2008 Integration Services shipped with the wonderful new Data Profiling Task. This task, much as the name implies, will profile the data in a given table and return a ton of useful information. The task gathers statistics for candidate keys, column length, NULL data ratio, data patterns, and much more. All of the results are written out to XML. The task makes more sense when you look at it, so let’s jump in. The configuration of the task is pretty simple, you select the types of profiles that you want to run and then set the specific options for that profile. In the figure below, I have selected Column Length Distribution for all columns (*) in the HumanResources.Employee table. This profile will gather length statistics on the columns such as Minimum and Maximum Length and provide a distribution of all of the lengths found. On the general page of the task, you also need to provide a destination, which can be a file connection or a variable.
Once you set up the task with all the data profiles you want to run, you simply execute the SSIS package and let it work it’s magic. If you wrote to a variable, you’re on you own to do something with the data within the SSIS package. On the other hand, if your wrote to a file, the SQL Server 2008 Client Tools come complete with the Data Profile Viewer which can be found at in your program menu under Microsoft SQL Server 2008 | Integration Services. This is a pretty simple application that allows you to browse the XML file created by the Data Profiling Task. The figure below shows the results for the Column Length Distribution profile I ran earlier. As you can see, there is quite a bit of information returned, and this is just one of the profiles. If you take some time and dig into all the profile types, I think you will be very impressed with the amount of information this task provides. If you’re the creative type, you can run this task and write the data to a variable which can be further queried by your SSIS package. With a little work, you could have a great tool at your disposal the next time your presented with a new database that you nothing about.