THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

SSIS Logging in Denali

In my blog post SSIS Server, Catalogs, Environments & Environment Variables in SSIS in Denali from November 2010 I alluded to forthcoming enhancements in the logging infrastructure when I said:

One final note on logging. The fairly limited information that gets captured in the current CTP is not the same as what will be in the product upon release – there is a lot more to come.

I wasn’t kidding. Those logging enhancements have hit the street in CTP3 and in my opinion they are worth waiting for. SSIS logging has long been an interest of mine primarily because the logging support in SSIS has, up to now, been woeful and that’s what led me to author what has proved to be one of my more popular blog posts Custom Logging Using Event Handlers way back in June 2005. Most serious SSIS shops I come across these days have implemented their own custom logging solution and I’m happy to be able to tell them that with Denali they are no longer forced to do that. In this blog post I’ll walk you through all of the crucial objects in the SSIS Catalog related to logging and by the end of it you’ll hopefully have a good handle on the new capabilities. I assume at this point that you understand what is meant by the terms catalog, projects, parameters, executions, executables, environments, environment references, environment variables and shared connection managers in the context of SSIS; if not then I have plenty of past material that you can go and read to fill yourself in at http://sqlblog.com/blogs/jamie_thomson/archive/tags/denali/ssis/default.aspx.

To demonstrate some of the new abilities I have put together a SSIS project that contains two packages, Master.dtsx & ExportDataForCountry.dtsx:

SSIS Solution Explorer

Note also that we have a Shared Connection Manager.

The intention is to access the [AdventureWorks2008] database and find the 3 best countries with the highest number of sales for a given year. Then, for each of those 3 countries in the given year, extract the sales data and (a) sort and output to a file and (b) aggregate and insert into a table. Here are my project parameters (note the descriptions):

SSIS Project Parameters pane

Here is Master.dtsx:

SSIS Package Pane in Visual Studio 2010

We use an Execute SQL Task to get the 3 top performing countries for the given year (in the “Year” project parameter) and call ExportDataForCountry.dtsx for each one of them. We also ensure that the folder (given in FolderPath project parameter) into which we are going to output the csv files exists.

Note also how I was able to take this screenshot of the whole package. This was made easier thanks to the move to Visual Studio 2010 because each open package is a pane in its own right and can be dragged out of the Visual Studio shell , then onto a second screen if you so wish. A really nice new feature.

ExportDataForCountry.dtsx consists of just one task – a dataflow:

SSIS Dataflow in Visual Studio 2010

It also has a package parameter (again, note the description):

SSIS Package Parameters in Visual Studio 2010

I am going to use the logging information created by executing Master.dtsx in the rest of the blog post to demonstrate the new logging capabilities in Denali. I have used verbose logging (the highest logging level) and environment variables; here is the resultant code to execute master.dtsx:

Declare @execution_id bigint
EXEC [SSISDB].[catalog].[create_execution] 
      @package_name    =  N'Master.dtsx', 
      @execution_id    =  @execution_id OUTPUT, 
      @folder_name    =  N'CTP3', 
      @project_name    =  N'20110712 Logging Demo', 
      @use32bitruntime  =  False, 
      @reference_id    =  3
Select @execution_id
DECLARE @var0 smallint = 3
EXEC [SSISDB].[catalog].[set_execution_parameter_value] 
      @execution_id,  
      @object_type    =  50, 
      @parameter_name    =  N'LOGGING_LEVEL', 
      @parameter_value  =  @var0
DECLARE @var1 bit = 0
EXEC [SSISDB].[catalog].[set_execution_parameter_value] 
      @execution_id,  
      @object_type    =  50, 
      @parameter_name    =  N'DUMP_ON_ERROR', 
      @parameter_value  =  @var1
EXEC [SSISDB].[catalog].[start_execution] 
      @execution_id

So then, let’s dive into what actually gets logged! if you want to follow along at home then you can download the project file (i.e. .ispac file) from https://skydrive.live.com/#!/?cid=550f681dad532637&sc=documents&uc=1&id=550F681DAD532637%2115204 (you will need to have an instance of [AdventureWorksDW2008] hanging around somewhere.)

You might want to go and get yourself a drink before you continue – this one is a biggie!!


[catalog].[executions]

You are going to become very very familiar indeed with [catalog].[executions]. It is a view that provides a record of all package executions on the server and, most importantly, it contains [execution_id] – the identifier for each execution and the field to which all other objects herein will be related. There are a number of other fields in here some of which are interesting and some which are not; I suspect that you are going to be most interested in:

  • start_time
  • end_time
  • status

[start_time] & [end_time] are self-explanatory. [status] represents the current status of an execution and its possible values are:

  • created (1)
  • running (2)
  • canceled (3),
  • failed (4)
  • pending (5)
  • ended unexpectedly (6)
  • succeeded (7)
  • stopping (8)
  • completed (9)

Putting those together then we get:

SELECT execution_id,status,
CASE  WHEN [status] = 1 THEN 'created'
    WHEN [status] = 2 THEN 'running'
    WHEN [status] = 3 THEN 'canceled'
    WHEN [status] = 4 THEN 'failed'
    WHEN [status] = 5 THEN 'pending'
    WHEN [status] = 6 THEN 'ended unexpectedly'
    WHEN [status] = 7 THEN 'succeeded'
    WHEN [status] = 8 THEN 'stopping'
    WHEN [status] = 9 THEN 'completed'
END AS [status_text]
,DATEDIFF(ss,start_time,end_time)DurationInSeconds
FROM  catalog.executions e
catalog.executions sample

We get some interesting operating system information in the way of total/available physical memory, total/available page file, number of available CPUs and the O/S process ID:

SELECT  execution_id,
    e.total_physical_memory_kb,
    e.available_physical_memory_kb,
    e.total_page_file_kb,
    e.available_page_file_kb,
    e.process_id
FROM  catalog.executions e
WHERE  e.[execution_id] = 10
catalog.executions available resources sample

Finally, we also get information about the environment that was used by the execution (if at all) such as the environment name and information pertaining to whether it was an absolute or relative reference (don’t worry too much about the different between an absolute or relative reference for now – its not all that important):

SELECT  e.[environment_name],e.reference_id,e.reference_type
FROM  catalog.executions e
WHERE  e.[execution_id] = 10

catalog.executions environment information sample

There are other columns in [catalog].[executions] but they’re either self-explanatory or not worth talking about here.

[catalog].[executions] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[execution_parameter_values]

This view is self-explanatory – it tells us what parameter values were supplied to the package when it executed however there is probably more information in here than you may first imagine there to be. If you look at the code at the top of this blog post that executed our package you will notice that there are two calls to stored procedure [catalog].[set_execution_parameter_value] (for LOGGING_LEVEL and DUMP_ON_ERROR) thus you may assume that there would be two rows in [catalog].[execution_parameter_values]. Not so, let’s take a look:

SELECT  epv.[parameter_name],epv.[parameter_value],epv.[value_set],epv.[runtime_override]
FROM  [catalog].[execution_parameter_values] epv
WHERE  [execution_id] = 10

catalog.execution_parameter_values  sample

Fourteen rows, let’s break them down:

  • We know that FolderPath & Year are our project parameters yet we didn’t define a value for them when we executed. The reason for their appearance in [catalog].[execution_parameter_values] is due to the fact that (as we saw in [catalog].[executions]) this execution was executed against an environment called “2003 localhost” The project was pre-configured to get a value for “Year” from the referenced environment (hence [value_set]=1) whereas “FolderPath” was pre-configured to use the server default and hence why we had no need to supply values for them using [catalog].[set_execution_parameter_value] ([value_set]=0).
  • DUMP_ON_ERROR & LOGGING_LEVEL are system parameters that we supplied values for using [catalog].[set_execution_parameter_value] (hence [runtime_override]=1)
  • CALLER_INFO, DUMP_EVENT_CODE, DUMP_ON_EVENT & SYNCHRONIZED are further system parameters that we could have supplied values for.
  • The six parameters whose name begin with “CM.AdventureWorksDW2008.” are actually properties of the project’s Shared Connection Manager and by default they get logged in [catalog].[execution_parameter_values]. Notice that CM.AdventureWorksDW2008.ServerName has [value_set]=1, that is because our environment reference specified a value for that property too.

[catalog].[execution_parameter_values] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[executables]

[catalog].[executables] contains a record for every single executable that gets executed in an execution. What’s an executable? It is every task, container or package in both the package that you execute and any child packages that also get executed using the Execute Package Task. Here’s what we get:

SELECT  executable_id,executable_name,package_name,package_path
FROM  [catalog].executables
catalog.executables sample

We can see that all our tasks and our For Each Loop (“FEL Loop over top performing regions”) appear in here along with the containing package. Interestingly we also get package_path which provides the name of the task in the context of its location in the package.

It is worth noting that we only get a record in [catalog].[executables] if an executable actually gets executed. If that doesn’t happen (because of an earlier error or conditional control-flow) then there won’t be a record in here.

Also note that executables “ExportDataForCountry” and “DFT Export Sales for named country” have an [executable_id] that is unique across the whole execution, not just across [package_name]. In other words SSIS doesn’t care which physical .dtsx file an executable is in, it treats them all as executables in a single execution; I happen to think that is a massive step forward in Denali and you’ll see even more benefits of that in later views.

[catalog].[executables] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[executable_statistics]

This is where the new logging infrastructure in Denali really starts to show benefit. For every executable in [catalog].[executables] we get information about it being executed such as whether it succeeded or not and how long it took. You might ask “Why not just put that information in [catalog].[executables]?” and actually the answer is simple when you think about it; an executable can, if it exists inside a For Loop or For Each Loop container, be executed more than once during a single execution. SSIS will capture information about each instance of an execution being executed and moreover will differentiate between each one of them. let’s take a look:

SELECT  executable_id,execution_path,execution_duration,execution_result
FROM  [catalog].[executable_statistics]

catalog.executable_statistics sample

We don’t just get the name of the executable, we get its position in the container hierarchy (aka call stack). Best of all that call stack extends across different packages (remember that “Master” and “ExportDataForCountry” are the names of our two packages). We can also differentiate between different instances of an executable via the [1], [2], [3], ... suffix on our For Loop and For Each Loop iterations.

Think back to the [msdb]..[sysssislog] table prior to Denali which is just a long flat list of intermingled records and how difficult it is to find the information in there that you are after and I think you will realise the benefit of having [execution_path]. This contextual information is something I have been asking for for a long time and I am delighted that it is now available. In fact, I’ll go on record as saying that I think [execution_path] is the best new feature in SSIS in Denali, I really do. Note that [execution_path] is different to [catalog].[executables].[package_path] that we saw earlier because [package_path] only tells us the location of an executable in a package; it does not provide information in the context of an execution therefore it doesn’t extend across packages either like [execution_path] does.

We get an execution duration for every single executable (we also get start time and end time which I haven’t shown on this screenshot). Lastly we also get [execution_result] which tells us whether an executable succeeded or not – the value in [execution_result] comes from the DTSExecResult enumeration.

There is a lot of information wrapped up in just these three columns and I make heavy use of it in SSIS Reporting Pack, a suite of reports that I am putting together which provide value on top of the logged information. Here is a screenshot of a new report that leans heavily on [catalog].[executable_statistics]:

SSIS Reporting Pack Execution Report sample

More to come on SSIS Reporting Pack in a later blog post.

[catalog].[executable_statistics] is logged to when LOGGING_LEVEL is set to Verbose, Performance, Basic or None.

[catalog].[event_messages]

[catalog].[event_messages] is the real meat of the new logging infrastructure and is where I suspect folks investigating SSIS issues will spend most of their time. It performs a very similar role to [msdb]..[sysssislog] however the information herein is much more enhanced as we shall see. It is also the table for which your choice of LOGGING_LEVEL can have the biggest affect.

The first point to make about [catalog].[event_messages] is that it does not contain [execution_id], instead it has [operation_id] which actually is an [execution_id]. The reason for the different name is that various different types of operations can occur on the SSIS Server, only one of which is package execution. Each operation will log into [catalog].[event_messages] and hence we have [operation_id] instead of [execution_id]. In this blog post the only operation type that I am covering is package execution.

The data in [msdb]..[sysssislog] was based around the notion of events, all tasks were capable of raising those events and we had the choice of capturing those events and logging them; from that perspective things are no different in Denali. We get all of the columns that we used to get in [msdb]..[sysssislog], the important ones being:

  • time of message
  • message
  • message source
  • event name
  • package name
SELECT  cast(message_time as datetime)message_time,message,package_name,event_name,message_source_name
FROM  [catalog].[event_messages]

catalog.event_messages sample

In addition we also get the aforementioned [package_path] and [execution_path] which, as I already have said, are great additions:

SELECT  [message_source_name],[package_path],[execution_path]
FROM  [catalog].[event_messages]
catalog.event_messages execution_path sample

Another new field is [subcomponent_name] which, as far as I can discern, contains the name of a dataflow component or the internal pipeline that is throwing the event. This is really useful information, previously we only got the name of the dataflow.

SELECT  subcomponent_name,message
FROM  [catalog].[event_messages]
catalog.event_messages subcomponent_name sample

Notice some of the information that we get here regarding rows cached and bytes of memory used, this is all really useful information to have access to.

Two other new columns are [message_type] and [message_source_type] which allow you to further drill down on the information that you are after. They appear in the view as numeric values and whilst there is no documentation publicly available yet on what these values mean I can fill you in on them right here:

First [message_type]

message_type Description
10 OnPreValidate
20 OnPostValidate
30 OnPreExecute
40 OnPostExecute
60 OnProgress
70 OnInformation
90 Diagnostic
110 OnWarning
140 DiagnosticEx
200 Custom events
400 OnPipeline*

and [message_source_type]

message_source_type Description
10 Messages logged by the entry APIs (e.g. T-SQL, CLR Stored procedures)
20 Messages logged by the external process used to run package (ISServerExec)
30 Messages logged by the package-level objects
40 Messages logged by tasks in the control flow
50 Messages logged by containers (For, ForEach, Sequence) in the control flow
60 Messages logged by the Data Flow Task

As I alluded to above your choice of LOGGING_LEVEL greatly influences the amount of information that gets logged. I ran this package against each of the four logging levels and here are the number of rows in [catalog].[event_messages]:

LOGGING_LEVEL Rows Time to Execute (seconds)
None 0 4
Basic 128 3
Performance 1 10
Verbose 2726 12

Given that this execution only contains 5 task executions and one For Each Loop, 2726 records for LOGGING_LEVEL=Verbose strikes me as being extravagant and indeed this is reflected in the much higher execution time. These numbers are not representative of your environment and of course you should do your own testing accordingly but it does seem as though LOGGING_LEVEL=Verbose can have a detrimental affect on execution duration. If you take a look at the messages that get logged from Verbose logging it is clear that this is intended to be used for diagnostic purposes; messages such as the following aren’t the sort that you want to be wading through on a regular basis:

catalog.event_messages diagnostics sample

Interestingly [message_source_name] for all these messages is ‘AdventureWorksDW2008’ which is the name of our Connection Manager so it seems as though LOGGING_LEVEL=Verbose may be a shorthand way of telling SSIS to log all information pertaining to resources external to the package.

I have noticed some interesting new messages when LOGGING_LEVEL=Verbose too that I don’t remember ever seeing before:

catalog.event_messages samples

Lots of useful information in there, I particularly like that we get a message for each executable telling us its execution duration. Yes, this information is in [catalog].[executable_statistics] as we have already discussed but its useful to have that information in our main log table too.

As you can see [catalog].[event_messages] is the workhorse of the new logging infrastructure so its worth spending time getting to know it. Similar to [catalog].[executable_statistics] I make heavy use of this table in SSIS Reporting Pack and the existence of [execution_path] makes it easy to browse to the messages that you want to see:

SSIS Reporting Pack Execution Report sample

We can highlight errors and warnings much more clearly too:

SSIS Reporting Pack Execution Report sample

[catalog].[executable_statistics] is logged to when LOGGING_LEVEL is set to Verbose, Performance or Basic. For LOGGING_LEVEL=Performance only OnError and OnWarning events get logged. For LOGGING_LEVEL=Basic OnError, OnInformation, OnPreExecute, OnPostExecute, OnPreValidate, OnPostValidate and OnWarning events are logged however as I demonstrated above less OnInformation events are logged than for LOGGING_LEVEL=Verbose.

[catalog].[event_message_context]

Have you ever wanted to know what the properties of an executing package were after the event? That is exactly what [catalog].[event_message_context] is provided for. For a very small subset of the messages in [catalog].[event_messages] it logs properties of the package and its connection managers at the point-in-time that the message is logged. The properties are logged as name:value pairs. This is all better illustrated with a screenshot:

SELECT  em.[message],emc.[package_path],emc.[property_name],emc.[property_value]
FROM  [catalog].[event_message_context] emc
inner join [catalog].[event_messages] em on emc.[event_message_id] = em.[event_message_id]

catalog.event_message_context sample

[catalog].[event_message_context] is logged to at the start of a package execution at which point it tells you all the properties of the package. This is useful (I guess) but more useful than that is the logging of properties when an OnError event occurs; more accurately, [catalog].[event_message_context] will record all the properties of a failed task. To demonstrate I shutdown the server hosting the [AdventureWorks2008DW] database prior to executing our package and here is a subset of what gets logged in [catalog].[event_message_context]:

SELECT  emc.[context_source_name],emc.[property_name],emc.[property_value]
FROM  [catalog].[event_message_context] emc 
INNER JOIN [catalog].[event_messages] em 
  on emc.event_message_id = em.event_message_id 
WHERE  em.operation_id = 19 
AND    em.event_name = 'OnError'
AND    [property_name] IN ('Connection','StartTime','StopTime','SqlStatementSource')

image

As you can see we get the actual SQL statement that was being executed which is invaluable information if the SQL statement is being built dynamically using an expression or a Script Task. Prior to Denali we simply did not have access to this sort of information and is a great example of how logging has been massively enhanced. Moreover we also get StartTime and StopTime so it is easy to see, at a glance, how long it took the task to fail. I find it interesting to note that StartTime and StopTime are execution-time-only properties, but they still get logged. I wonder what other execution-time-only properties exist? If you find any do let me know!

One final thing to note is that when an error occurs [catalog].[event_message_context] will contain information about every object in the call stack so if, for example, an Execute SQL Task that is inside a For Each loop fails you will get information about the For Each loop too. Very cool!

[catalog].[event_message_context] is logged to when LOGGING_LEVEL is set to Verbose, Performance and Basic.

[catalog].[execution_component_phases]

If you have ever wanted to know which parts of a dataflow are running slowly then [catalog].[execution_component_phases] is for you. It records how long each data flow component spends in each phase of execution, those phases are:

  • Validate
  • PrepareForExecute
  • PreExecute
  • ProcessInput
  • PrimeOutput
  • PostExecute
  • Cleanup

The most obvious use of this view is simply to discover where the bottlenecks are in your dataflows by discovering which components take the longest time to complete. Before we look at the information in [catalog].[execution_component_phases] let’s remind ourselves of the dataflow that we’re going to examine here:

There are nine components, let’s look at how long each of them is “alive” for during the execution of this dataflow:

SELECT  q2.[subcomponent_name],q2.[phase_time_milliseconds]
FROM  (
    SELECT  q.[subcomponent_name],DATEDIFF(ms,q.[min_start_time],q.[max_end_time])[phase_time_milliseconds]
    FROM  (
        SELECT  [subcomponent_name],MIN([start_time])[min_start_time],MAX([end_time])[max_end_time]
        FROM  [catalog].execution_component_phases
        where  [execution_path] = @execution_path
        group  by [subcomponent_name]
        )q
    )q2
ORDER BY q2.[phase_time_milliseconds] DESC

catalog.execution-component_phases subcomponent_name summary sample

Here we see that LKP Get Customer details has spent the most time executing. To be honest though this query does not give us the best indication of where the bottlenecks are because it includes all phases; the Validate phase will begin at roughly the same time for all components whereas the Cleanup phase for each will end at roughly the same time as well. A better indicator would be to look at the ProcessInput and PrimeOutput phases because this is generally where most of the work is done:

SELECT  q2.[subcomponent_name],q2.[phase_time_milliseconds]--,q2.[min_start_time],q2.[max_end_time]
FROM  (
    SELECT  q.[subcomponent_name],q.[min_start_time],q.[max_end_time],DATEDIFF(ms,q.[min_start_time],q.[max_end_time])[phase_time_milliseconds]
    FROM  (
        SELECT  [subcomponent_name],MIN([start_time])[min_start_time],MAX([end_time])[max_end_time]
        FROM  [catalog].execution_component_phases
        where  [execution_id] = @verbose
        and    [execution_path] = @execution_path
        and    phase in ('ProcessInput','PrimeOutput')
        group  by [subcomponent_name]
        )q
    )q2
ORDER BY q2.[phase_time_milliseconds] DESC

catalog.execution_component_phases subcomponent_name work summary sample

Here we get a much better indicator of where time is spent and where the bottlenecks are in our dataflow. Unsurprisingly the destination component that inserts data into the database takes the longest time to complete its work however look at the next three, the Sort component, the Aggregate component and the OLE DB Source component; these three components are all asynchronous so the information that we are seeing here supports the common held belief that asynchronous components are generally slower.

Also notice that Multicast is the shortest to execute. Again not surprising given that Multicast components don’t actually do any work.

We have two derived column components (“DER Concat Full Name” & “DER Calculate TotalPrice”) yet one of them takes a lot less time than the other. Again this makes sense if we look at our dataflow; “DER Concat Full Name” occurs after the Aggregate component thus it will almost certainly be operating on less rows. Less rows means less work to do means less time to execute.

As an aside, [catalog].[execution_component_phases] makes it easy to determine exactly which components are asynchronous because they are the ones that have a PrimeOutput phase:

SELECT  q2.[subcomponent_name],q2.[phase_time_milliseconds]--,q2.[min_start_time],q2.[max_end_time]
FROM  (
    SELECT  q.[subcomponent_name],q.[min_start_time],q.[max_end_time],DATEDIFF(ms,q.[min_start_time],q.[max_end_time])[phase_time_milliseconds]
    FROM  (
        SELECT  [subcomponent_name],MIN([start_time])[min_start_time],MAX([end_time])[max_end_time]
        FROM  [catalog].execution_component_phases
        where  [execution_id] = @verbose
        and    [execution_path] = @execution_path
        and    phase in ('PrimeOutput')
        group  by [subcomponent_name]
        )q
    )q2
ORDER BY q2.[phase_time_milliseconds] DESC

catalog.execution_component_phases primeoutput asynchronous sample

OK, so we have seen that the asynchronous components and the destination component that inserts into a database generally take the longest in the “work” phases however there are other insights to be had here. If we take a look at the PreExecute phase:

SELECT  q2.[subcomponent_name],q2.[phase_time_milliseconds]--,q2.[min_start_time],q2.[max_end_time]
FROM  (
    SELECT  q.[subcomponent_name],q.[min_start_time],q.[max_end_time],DATEDIFF(ms,q.[min_start_time],q.[max_end_time])[phase_time_milliseconds]
    FROM  (
        SELECT  [subcomponent_name],MIN([start_time])[min_start_time],MAX([end_time])[max_end_time]
        FROM  [catalog].execution_component_phases
        where  [execution_id] = @verbose
        and    [execution_path] = @execution_path
        and    phase in ('PreExecute')
        group  by [subcomponent_name]
        )q
    )q2
ORDER BY q2.[phase_time_milliseconds] DESC

catalog.execution_component_phases subcomponent_name preexecute summary sample

we notice that our lookup component (“LKP Get customer details”) appears much higher in the list than it did previously. This is not surprising given that this is a fully cached lookup component and the PreExecute phase is when that cache gets populated.

Clearly there are many many insights to be gleaned from [catalog].[execution_component_phases] and it should prove to be an invaluable tool when performance tuning dataflows.

[catalog].[execution_component_phases] is logged to when LOGGING_LEVEL is set to Verbose and Performance.

[catalog].[execution_data_statistics]

Where [catalog].[execution_component_phases] tells us how long each component executed for, [catalog].[execution_data_statistics] tells us how much data they processed. I’m sure that if you have ever executed packages in BIDS and made use of the real-time rowcounts that appear in the GUI:

SSIS dataflow denali rowcounts

then at some point you will have wished that you had the same outside of BIDS too; well, that is exactly what [catalog].[execution_data_statistics] provides. Every time a dataflow component passes a buffer on to the next component then you will get a record in [catalog].[execution_data_statistics]. Specifically you will get the following information:

  • execution_path of the containing dataflow (we’ve covered this quite a bit already)
  • dataflow path id (a meaningful unique identifier for each data path in the dataflow)
  • source component (the component passing the buffer on)
  • destination component (the component receiving the buffer)
  • time that the buffer was passed on
  • number of rows in the buffer

Here’s what that looks like for our dataflow:

SELECT  eds.[dataflow_path_id_string],eds.[rows_sent],eds.[created_time]
FROM  [catalog].[execution_data_statistics] eds
WHERE  eds.[execution_path] = @execution_path
ORDER  BY [dataflow_path_id_string]

catalog.execution_data_statistics raw data

Of course once you have the raw data then you can party on it as much as you want. Remember that [catalog].[execution_data_statistics] has a record for each buffer, not for each datapath and hence to get a total per datapath you’re going to need to SUM:

SELECT  eds.[dataflow_path_id_string],SUM(eds.[rows_sent])[total_rows_sent],MIN(eds.[created_time])[first_buffer_sent],MAX(eds.[created_time])[last_buffer_sent]
FROM  [catalog].[execution_data_statistics] eds
WHERE  eds.[execution_path] = @execution_path
GROUP  BY [dataflow_path_id_string]

catalog.execution_data_statistics rows summary

[catalog].[execution_data_statistics], along with [catalog].[execution_component_phases], seems set to be an integral part of your dataflow performance tuning armoury.

[catalog].[execution_data_statistics] is logged to when LOGGING_LEVEL is set to Verbose.


That concludes this run through of the new logging capabilities in SQL Server Denali CTP3. I personally am very happy indeed with these enhancements and am looking forward to building new solutions on this infrastructure as soon as possible. How about you? Are these new capabilities what you have been waiting for or are they superfluous to your needs? Let me know in the comments.

@Jamiet

Published Saturday, July 16, 2011 5:16 PM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SSIS Junkie said:

Just a short post here to let you know that the SQL Server Integration Services (SSIS) product team recently

August 2, 2011 2:47 PM
 

SSIS Junkie said:

Yesterday evening I had the pleasure of presenting at the London SQL Server User Group meeting. It was

August 19, 2011 6:19 AM
 

Julie Smith said:

Wow Jamie.  Extremely informative and helpful post.  Thanks!  Logging for components--yay!

August 31, 2011 11:46 PM
 

SSIS Junkie said:

In December 2010 I announced in a blog post Introducing SSIS Reporting Pack for SQL Server code-named

September 4, 2011 9:57 AM
 

SSIS Junkie said:

SQL Server codename Denali was presented to the world yesterday and with it a promise that there's a

September 7, 2011 10:43 AM
 

Samuel Vanga said:

Hi Jamie,

Will it be bad practice to create new tables in the SSISDB database? I'm doing some metadata driven execution (For Loop container executes SPs retrieved from a table).

Wondering if this table can be created in SSISDB maybe under a new schema. Please let me know your opinion.

Thanks,

Samuel.

June 8, 2012 12:52 PM
 

jamiet said:

Hi Samuel,

Your question of "*can* I do do this" is easy to answer - yes you can. The question of "*should* I do this" is a little harder. I mean, SSISDB is a user database so you can treat it like any other user database - I don't have any good guidance on whether its a good idea or not though I'm afraid.

JT

June 11, 2012 4:03 AM
 

SimS said:

Hi Jamie,

This is really useful, and my client is keen on seeing if we can use the Reporting Pack (customised).

Thanks a lot

Sim

August 1, 2012 5:52 AM
 

Moham Mawla said:

Nice article

I think there's a type at the end of section [catalog].[execution_component_phases]

"

[catalog].[event_message_context] is logged to when LOGGING_LEVEL is set to Verbose and Performance.

"

The table name just should be [execution_component_phases]

Cheers

September 3, 2012 2:49 PM
 

jamiet said:

Thank you Moham, fantastic spot. I have now corrected it.

September 3, 2012 5:52 PM
 

Warren said:

Fantastic article - thank you!

October 7, 2014 4:27 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement