THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

The Rambling DBA: Jonathan Kehayias

The random ramblings and rantings of frazzled SQL Server DBA

  • An XEvent a Day (25 of 31) – The Twelve Days of Christmas

    Correction:  I mistakenly stated that Tim Mitchel had blogged the 12 days of SQL Christmas, and was informed that Tim Ford is who wrote that blog post.  My sincerest apologies to Tim and Tim for the mix up.  The post has been corrected below. 

    In the spirit of today’s holiday, a couple of people have been posting SQL related renditions of holiday songs.  Tim Ford posted his 12 days of SQL Christmas, and Paul Randal and Kimberly Tripp went as far as to record themselves sing SQL Carols on their blog post Our Christmas Gift To You: Paul and Kimberly Singing!  For today’s post on Extended Events I give you the 12 days of Christmas, Extended Events style (all of these are based on true facts about Extended Events in SQL Server).

    On the first day of Christmas Extended Events gave to me: a chance to write a useful GUI.
    On the second day of Christmas Extended Events gave to me: two sqlserver packages.
    On the third day of Christmas Extended Events gave to me: three new DDL Commands.
    On the fourth day of Christmas Extended Events gave to me: four ETW channels.
    On the fifth day of Christmas Extended Events gave to me: five in-memory targets.
    On the sixth day of Christmas Extended Events gave to me: six keyword maps.
    On the seventh day of Christmas Extended Events gave to me: seven session options.
    On the eighth day of Christmas Extended Events gave to me: eight file io events.
    On the ninth day of Christmas Extended Events gave to me: nine database_log events.
    On the tenth day of Christmas Extended Events gave to me: ten new DMV’s.
    On the eleventh day of Christmas Extended Events gave to me: eleven Operational events.
    On the twelfth day of Christmas Extended Events gave to me: twelve sqlos Actions.

  • An XEvent a Day (24 of 31) – What is the package0.callstack Action?

    One of the actions inside of Extended Events is the package0.callstack and the only description provided by sys.dm_xe_objects for the object is 16-frame call stack.  If you look back at The system_health Session blog post, you’ll notice that the package0.callstack Action has been added to a number of the Events that the PSS team thought were of significance to include in the Event Session.  We can trigger an event that will by logged by our system_health Event Session by raising an error of severity >=20 with the RAISERROR functionality in TSQL.

    -- Generate a Severity 20 Error to trigger system_health
    -- sqlserver.error_reported Event
    RAISERROR(50001, 20, 1, 'This is an Error!') WITH LOG

    image

    After raising the error, we can query the system_health Event Session for the callstacks that have been collected by adding into our XQuery a filter for the action node with the @name attribute = “callstack”

    SELECT n.query('.') AS callstack
    FROM
    (
        SELECT CAST(target_data as xml)
        FROM sys.dm_xe_sessions AS s 
        INNER JOIN sys.dm_xe_session_targets AS t
            ON s.address = t.event_session_address
        WHERE s.name = 'system_health'
          AND t.target_name = 'ring_buffer'
    ) AS src (target_data)
    CROSS APPLY target_data.nodes('RingBufferTarget/event/action[@name="callstack"]') as q(n)

    This will only return the action nodes for the callstack and the XML fragment will be similar to the following:

        <action name="callstack" package="package0">
          <type name="callstack" package="package0" />
          <value>0x0000000001CD4F55
    0x000000000113A310
    0x0000000000BEA7D0
    0x0000000001A3A0CC
    0x0000000002FA3EAE
    0x0000000000BC9616
    0x0000000000BCABBB
    0x0000000000BCA4D9
    0x0000000000BCD10B
    0x0000000000BC7C9B
    0x0000000000B6163B
    0x0000000000B612FA
    0x0000000000B60E35
    0x00000000010E0E50
    0x00000000010E09A0
    0x00000000010F9AB0</value>
          <text />
        </action>
    

    So what is it about this information that would make it important enough to collect?  The callstack provides the most recent 16 frames inside of the sqlservr process.  If you create a dump file of the sqlservr process using sqldumper.exe, you canopen the mdmp file up in windbg, load the public symbols for sql  Server, and then walk the stack with the ln <stack address> command.  For example the above callstack resolves in windbg as:

    (00000000`01cd4f10)   sqlservr!GenericEvent::CallNextAction+0x45   |  (00000000`01cd5000)   sqlservr!AutoSpinlockHolder<170,1,1>::~AutoSpinlockHolder<170,1,1>
    (00000000`00b78be0)   sqlservr!_chkstk+0xf276c   |  (00000000`00b78c30)   sqlservr!IsWorktableRowset
    (00000000`00bea640)   sqlservr!ErrorReportedAutoPublish::Publish+0x190   |  (00000000`00bea820)   sqlservr!CErrorReportingManager::CwchCallFormatMessage
    (00000000`00b78be0)   sqlservr!_chkstk+0x1bd96d   |  (00000000`00b78c30)   sqlservr!IsWorktableRowset
    (00000000`02fa3800)   sqlservr!CXStmtError::XretExecute+0x6ae   |  (00000000`02fa44b0)   sqlservr!CStmtDbcc::XretExecute
    (00000000`00bc8f80)   sqlservr!CMsqlExecContext::ExecuteStmts<1,1>+0x55a   |  (00000000`00bc9e30)   sqlservr!CSessionTaskProxy::AddRef
    (00000000`00bca630)   sqlservr!CMsqlExecContext::FExecute+0x58b   |  (00000000`00bcad60)   sqlservr!CExecParamTblHelperForExecParamTable::`vftable'
    (00000000`00bca1c0)   sqlservr!CSQLSource::Execute+0x319   |  (00000000`00bca630)   sqlservr!CMsqlExecContext::FExecute
    (00000000`00bcd1a0)   sqlservr!process_request+0x370   |  (00000000`00bcd6c0)   sqlservr!CAutoSetupCXCtxtS::~CAutoSetupCXCtxtS
    (00000000`00bc7990)   sqlservr!process_commands+0x2b2   |  (00000000`00bc7c10)   sqlservr!CConnection::PNetConn
    (00000000`00b61520)   sqlservr!SOS_Task::Param::Execute+0x11b   |  (00000000`00b616f0)   sqlservr!Worker::Reset
    (00000000`00b61230)   sqlservr!SOS_Scheduler::RunTask+0xca   |  (00000000`00b61520)   sqlservr!SOS_Task::Param::Execute
    (00000000`00b60da0)   sqlservr!SOS_Scheduler::ProcessTasks+0x95   |  (00000000`00b61090)   sqlservr!WorkDispatcher::DequeueTask
    (00000000`010e0d40)   sqlservr!SchedulerManager::WorkerEntryPoint+0x110   |  (00000000`010e0ea0)   sqlservr!SOSQueueCounted<Worker,0>::Dequeue
    (00000000`010e0940)   sqlservr!SystemThread::RunWorker+0x60   |  (00000000`010e0a10)   sqlservr!SchedulerManager::AcquireWorker
    (00000000`010f9980)   sqlservr!SystemThreadDispatcher::ProcessWorker+0x12c   |  (00000000`010f9b00)   sqlservr!SEList<SystemThread,112>::Head

    image

    This information isn’t really of much use unless you have access to or understand the SQL Server Source code.  In the event that you have an issue, the PSS team can create a memory dump of the process, collect the output from the ring_buffer target, and walk the stack to see what lead to the Event firing.

    It is possible to materialize the stack without having to actually perform a memory dump and without using windbg.  In the SQLCAT team blog post Resolving DTC Related Waits and Tuning Scalability of DTC, Trace Flag 3656 is documented as materializing the callstack if the sqlservr.pdb symbols file exists in the same directory as sqlservr.exe. 

    NOTE: There is a reason that this functionality is not turned on by default in SQL Server.  It is not recommended that you enable this Trace Flag on a production server unless directed to do so by PSS as a part of a support case.  This Trace Flag can impact performance and should not be used lightly.

    In SQL Server 2008, the symbols file is not included by default in the product.  To get the symbols file, you can use windbg and a memory dump.  For steps on how to do this, see http://blogs.msdn.com/b/askjay/archive/2009/12/29/basic-debugging-concepts-and-setup.aspx.  Once you open the memory dump file for the first time, the symbols are downloaded from the public symbols server and placed in the .sympath specified, in the case of the blog post mentioned it will be C:\symbols\public\sq\sqlservr.pdb\1E7168D2F78B4FBA911F507689D7DE902.  After copying the pdb to the Binn folder for the SQL instance, by default C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn, we can turn on the trace flag and requery our Event Session.

    --Trace flag 3656 enables the call stacks to be resolved.  This requires that the 
    --sqlservr.pdb file reside in the same directory as sqlservr.exe
    DBCC TRACEON (3656, -1)  
    GO
    SELECT n.query('.') AS callstack
    FROM
    (
        SELECT CAST(target_data as xml)
        FROM sys.dm_xe_sessions AS s 
        INNER JOIN sys.dm_xe_session_targets AS t
            ON s.address = t.event_session_address
        WHERE s.name = 'system_health'
          AND t.target_name = 'ring_buffer'
    ) AS src (target_data)
    CROSS APPLY target_data.nodes('RingBufferTarget/event/action[@name="callstack"]') as q(n)

    The output of our callstack action is now:

    <action name="callstack" package="package0">
      <type name="callstack" package="package0" />
      <value>GenericEvent::CallNextAction+45 [ @ 0+0x0
    _chkstk+f276c [ @ 0+0x0
    ErrorReportedAutoPublish::Publish+190 [ @ 0+0x0
    _chkstk+1bd96d [ @ 0+0x0
    CXStmtError::XretExecute+6ae [ @ 0+0x0
    CMsqlExecContext::ExecuteStmts&lt;1,1&gt;+55a [ @ 0+0x0
    CMsqlExecContext::FExecute+58b [ @ 0+0x0
    CSQLSource::Execute+319 [ @ 0+0x0
    process_request+370 [ @ 0+0x0
    process_commands+2b2 [ @ 0+0x0
    SOS_Task::Param::Execute+11b [ @ 0+0x0
    SOS_Scheduler::RunTask+ca [ @ 0+0x0
    SOS_Scheduler::ProcessTasks+95 [ @ 0+0x0
    SchedulerManager::WorkerEntryPoint+110 [ @ 0+0x0
    SystemThread::RunWorker+60 [ @ 0+0x0
    SystemThreadDispatcher::ProcessWorker+12c [ @ 0+0x0</value>
      <text />
    </action>

    If you note, these match up to the stack output from windbg.  If you are interested in trying to figure out the internal stack of SQL Server, the package0.callstack event can certainly be useful, but in general it is not something that you will get much use of in general troubleshooting with Extended Events. 

     

  • An XEvent a Day (23 of 31) – How it Works – Multiple Transaction Log Files

    While working on yesterday’s blog post The Future – fn_dblog() No More? Tracking Transaction Log Activity in Denali I did a quick Google search to find a specific blog post by Paul Randal to use it as a reference, and in the results returned another blog post titled, Investigating Multiple Transaction Log Files in SQL Server caught my eye so I opened it in a new tab in IE and went about finishing the blog post.  It probably wouldn’t have gotten my attention if it hadn’t been on the SqlServerPedia site.  When I was finished I went back and read through the post, and I found that some of the information presented in it was incorrect, so I attempted to post a comment, and not surprisingly the blog had moderation controls turned on, I have it turned on here if you aren’t a SQLBlog member so I don’t have a problem with that necessarily, and the comment didn’t show up on the site.

    Interestingly enough, yesterday SQL Server Central had an editorial by Tim Mitchell titled Calling Out Bad Advice that discussed the problem of bad information on the internet and how to go about calling people out for publishing bad advice.  Lets face it, people are human, at least I am, and mistakes happen from time to time, either through our own misunderstandings of our personal experiences and what we perceived from the information we had, or by shear accident in some cases.  This afternoon I got an email back from the blog post author and we traded a few emails about the post, and in the end the author made changes to the original post which have been syndicated to SQLServerPedia already, so to see the original you have do something like look at the Google Cached Copy.  The author also posted a follow up blog post today on the subject.

    So why this blog post?  Well even with the corrections, some of the conclusions are still wrong.

    image

    I am not trying to knock this guy for what he saw or perceived from the information he collected, but 2, 3 and 4 are still incorrect.  What’s great is we can prove this by using Extended Events in SQL Server 2008 and that is what the real purpose behind this blog post is.  To set things up, we first need to create a database that roughly matches the available information shown in the pictures of the original blog post.  The database will have a single database file, that I am sizing initially at 128MB and will have a fixed autogrowth value of 64MB.  The database will have four log files that are initially sized at 1MB each, and the first log file will have a fixed autogrowth value of 32MB, with the last three transaction log files having fixed growth values of just 1MB.  Don’t comment on this configuration, I understand completely that there is no reason to create multiple log files on the same disk array (half the purpose behind this post is to show that there is benefit to having multiple log files like this, which is also the intended purpose behind the original blog post as well), and I wouldn’t do this in production, but it works perfectly for the tests that we are about to run.  Once the database is created, we’ll switch to that database, and dump out the transaction log VLF information using DBCC LOGINFO.

    -- Create our Test database with
    --        1 data file sized at 128MB with 64MB autogrowth
    --        1 log file sized at 1MB with 32MB autogrowth
    --        3 log files sized at 1MB with 1MB autogrowth
    CREATE DATABASE [Test] ON  
    PRIMARY 
        (    
            NAME = N'Test', 
            FILENAME = N'D:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test.mdf', 
            SIZE = 131072KB, 
            FILEGROWTH = 65536KB
        )
    LOG ON 
        (    
            NAME = N'Test_log', 
            FILENAME = N'L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test_log.ldf', 
            SIZE = 1024KB, 
            MAXSIZE = 131072KB, 
            FILEGROWTH = 32768KB
        ), 
        (    
            NAME = N'Test_log2',     
            FILENAME = N'L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test_log2.ldf', 
            SIZE = 1024KB, 
            MAXSIZE = 131072KB, 
            FILEGROWTH = 1024KB 
        ), 
        ( 
            NAME = N'Test_log3', 
            FILENAME = N'L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test_log3.ldf', 
            SIZE = 1024KB, 
            MAXSIZE = 131072KB, 
            FILEGROWTH = 1024KB 
        ), 
        ( 
            NAME = N'Test_log4', 
            FILENAME = N'L:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\DATA\Test_log4.ldf', 
            SIZE = 1024KB, 
            MAXSIZE = 131072KB, 
            FILEGROWTH = 1024KB 
        )
    GO
    -- Switch to our Test database
    USE [Test]
    GO
    -- Dump the VLF Usage information
    DBCC LOGINFO
    GO

    image

    Each of the log files VLF’s have been highlighted in a different color above to point out the separation of the four different files.  Note that the active VLF is in the first log file, FileId=2, as shown by the Status=2.  With our test database created, we can now set out to create the Extended Events Event Session that:

      1. The transaction logs are written to sequentially starting with the first VLF in FileId=2 and then when the last VLF in FileId=2 is full, the log begins writing log records to the first VLF of FileId=3 and when the last VLF in FileId=3 is full, the log begins writing log records to the first VLF of FileId=4 and when the last VLF in FileId=4 is full, the log begins writing log records to the first VLF of FileId=5 and when the last VLF in FileId=5 is full, the log circles back to the first VLF of FileId=2 which will still be active because we are going to work within a single explicit transaction for the duration of the test.  Since the file is full it has to be grown, and because it has a growth factor of 32MB it grows by 32MB and begins writing log records to the first VLF of the newly allocated space.
      2. The writes to the log files do not happen at the same time, they occur sequentially as the engine writes log records into each file, filling the VLF’s and has to move to the next file, or circle back to the beginning of the log when it reaches the end of the last log file.
      3. Whatever results were seen in the original thread by opening the log file with Apex tools were incorrect and misleading to the original poster, since log files were actually written to all of the files during the operation.  I have a couple of theories as to what could have happened that made the Apex tool show no log records that I will discuss later in this thread.

    What Events would we want to capture to look at what is happening in our transaction log files when running the same workload from the original post?  Since we are going to be executing a number of statements, the sqlserver.sql_statement_starting and sqlserver.sql_statement_completed Events seem like a good starting point, and since we want to know what statement was executing, we’ll add the sql_text Action to these Events.  Since we are dealing with the transaction log files, the sqlserver.databases_log_file_size_changed, sqlserver.databases_log_file_used_size_changed, sqlserver.databases_log_flush_wait, sqlserver.databases_log_flush, sqlserver.databases_log_growth, and sqlserver.databases_log_truncation Events should probably be included to so we can track what’s going on with our log specifically, and to ensure that these Events only fire for our test database, we’ll dynamically build in a Predicate on the sqlserver.database_id Predicate source using the output of DB_ID() inside the testing database. 

    Since the log is a file, we also will want to collect the file operation related events such as sqlserver.flush_file_buffers, sqlserver.file_read, sqlserver.file_written, sqlserver.file_read_completed, and sqlserver.file_write_completed, and we’ll dynamically set a database_id Predicate on these Events as well.  If you recall back to Friday of last week, I talked about a number of trace flags that provide further information about Backup, Restore and file operations in my blog post A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 1).  One of those was Trace Flag 3004, which writes file zeroing information to the trace print output whenever a zeroing operation occurs.  I previously used this trace flag in my blog post Does the tempdb Log file get Zero Initialized at Startup?  Since the log files grew in the original tests, we can turn this trace flag on to track the file growths, and use the sqlserver.trace_print Event to capture the file operation messages, and to keep this Event focused to our tests only, we’ll dynamically set a Predicate for the current session_id using the sqlserver.session_id Predicate Source and the output of @@SPID.  Finally since this is all happening inside of an explicit transaction, we’ll also capture the sqlserver.database_transaction_begin and sqlserver.database_transaction_end events for the current database_id.

    We have quite a large list of Events associated with this Event Session, and to ensure that we can perform analysis over all of the Event data from our tests, we’ll use the package0.asynchronous_file_target to hold our Event information.  We’ll also increase our buffer memory from the default 4MB to 8MB and set the Event Session up to ALLOW_SINGLE_EVENT_LOSS, which does exactly what it sounds like it does, and to correlate cause and effect we’ll also turn TRACK_CAUSALITY to ON for the session.

    -- Create our Event Session dynamically
    DECLARE @sqlcmd nvarchar(2000) = '
    CREATE EVENT SESSION TransactionLogUsage
    ON SERVER
    --ADD EVENT sqlserver.sql_statement_starting
    --( ACTION(sqlserver.sql_text)
    --  WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    --ADD EVENT sqlserver.sql_statement_completed
    --( ACTION(sqlserver.sql_text)
    --  WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.databases_log_file_size_changed
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.databases_log_file_used_size_changed
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.databases_log_flush_wait
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.databases_log_flush
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.databases_log_growth
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.databases_log_truncation
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.flush_file_buffers
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.file_read
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.file_written
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.file_read_completed
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.file_write_completed
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.trace_print
    (   WHERE (sqlserver.session_id = '+ cast(@@SPID as varchar(4))+')),
    ADD EVENT sqlserver.database_transaction_begin
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
    ADD EVENT sqlserver.database_transaction_end
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+'))
    ADD TARGET package0.asynchronous_file_target(
         SET filename=''C:\SQLBlog\TransactionLogUsage.xel'',
             metadatafile=''C:\SQLBlog\TransactionLogUsage.xem'')
    WITH (MAX_MEMORY = 8MB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, TRACK_CAUSALITY = ON )'
    EXEC (@sqlcmd)
    GO

    If you notice above, I have commented out the sqlserver.sql_statement_starting and sqlserver.sql_statement_completed Events in the Event Session.  It turns out that these two events are not needed in the Event Session to prove the points being made in this blog post.  Including these two events makes the time to process the 240K+ Events run in the 10-15 minute range on my 16 core test server, so its not likely something that you are going to do on a laptop VM, but they were included in my initial Event Session for this, and I wanted to show the thought process I followed to get from A to B and ultimately C.

    With our Event Session created, we can finish setting up our environment to run the actual tests.  To do this we’ll create a table named LogTable with two columns that are, as best as I can tell from the limited information provided about the test table, the same as the table used in the original post.  We’ll then CHECKPOINT the database to cause log truncation to occur (you did create the database in SIMPLE recovery right?), turn on Trace Flag 3004 for our session, and then start the Event Session so that it collects the data from our Events during our test.

    -- Create our Test Table
    CREATE TABLE LogTable (RowID decimal(10,4), Data char(1024))
    GO
    -- Checkpoint the database to truncate and clear the log.
    CHECKPOINT
    GO
    -- Turn on Trace Flag 3004 so we can see file zeroing ops.
    DBCC TRACEON(3004)
    GO
    -- Start the Event Session
    ALTER EVENT SESSION TransactionLogUsage
    ON SERVER
    STATE=START
    GO

    With the Event Session started and all our setup work completed we can now run the test script that was used in the original post to generate our test workload.  When the tests complete, we’ll dump out our VLF information again with DBCC LOGINFO, then ROLLBACK the transaction, switch to master and DROP our test database and the Extended Events Session from the server since they are no longer needed.

    -- Run our tests
    SET NOCOUNT ON
    
    DECLARE @cnt decimal(10,4)=0
    DECLARE @rows int=0
    BEGIN TRAN
    WHILE 1=1
    BEGIN
        INSERT INTO LogTable VALUES (ROUND((RAND()* 1000000),0), SPACE(1024))
        
        SELECT @rows+=1
            
        SELECT @cnt = (size * 1.0 * 8.0)/1024.0 
        FROM  Test.sys.database_files
        WHERE data_space_id = 0
        AND [FILE_ID]=5
        
        IF @cnt>1.0
                BREAK
    END
    
    SELECT @rows;
    GO
    -- Pull Log VLF usage again
    DBCC LOGINFO
    GO
    -- Rollback our transaction
    ROLLBACK
    GO
    USE master
    GO
    -- Kill any connection to Test database
    ALTER DATABASE [Test] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    GO
    -- Drop the Test database
    DROP DATABASE [Test]
    GO
    -- Drop the Event Session
    DROP EVENT SESSION TransactionLogUsage
    ON SERVER
    GO

    image

    There is a reason that I dump out the VLF information before performing the ROLLBACK of the transaction.  As long as the transaction remains active, the VLF’s containing the active transaction can not be truncated and cleared.  In order to see the allocated VLF’s, we need the transaction active still.  Once again, I have highlighted each of the individual log files separately, and from the DBCC LOGINFO output we can look at the FileId and Status columns and see that our transaction log wrote information into all 4 of the files, filling them, and the wrapped back to the first file which had to be grown, and each of the subsequent log files were also grown by the database engine.  However, if you look at the CreateLSN information for the growth portion of each log file, you will notice that each file has its own Create LSN value for the second set of VLF’s, meaning that they were grown separately and at different times.  Still not convinced by DBCC LOGINFO?  Well we have the data to validate this and prove it unequivocally, but before we can look at the data, we need to retrieve it from the asynchronous_file_target files and shred the XML using XQuery.

    -- Create our Analysis Database
    CREATE DATABASE TLogUsageTestResults
    GO
    -- Switch to our Analysis Database
    USE [TLogUsageTestResults]
    GO
    -- Create intermediate temp table for raw event data
    CREATE TABLE RawEventData
    (Rowid int identity primary key, event_data xml)
    
    -- Create final results table for parsed event data
    CREATE TABLE TestResults
    ([Rowid] int primary key, [event_name] varchar(50), [package_name] varchar(50),
     [timestamp] datetime2, [count] bigint, [increment] bigint, [database_id] int, 
     [mode] nvarchar(4000), [file_handle] nvarchar(4000), [offset] bigint, 
     [file_id] int, [file_group_id] int, [path] nvarchar(4000), [duration] bigint, 
     [io_data] nvarchar(4000), [succeeded] nvarchar(4000), [sql_text] nvarchar(4000), 
     [trace_message] nvarchar(4000), [source_database_id] int, [object_id] int, 
     [object_type] int, [cpu] int, [reads] bigint, [writes] bigint, 
     [state] nvarchar(4000), [offset_end] int, [nest_level] int, 
     [activity_id] uniqueidentifier, [event_sequence] int )
    
    -- Read the file data into intermediate temp table
    INSERT INTO RawEventData (event_data)
    SELECT
        CAST(event_data AS XML) AS event_data
    FROM sys.fn_xe_file_target_read_file('C:\SQLBlog\TransactionLogUsage*.xel', 
                                         'C:\SQLBlog\TransactionLogUsage*.xem', 
                                         null, null)
    
    -- Query the Event data from the Target.
    INSERT INTO TestResults
    ([Rowid], [event_name], [timestamp], [database_id], [count], [increment], 
     [mode], [file_handle], [offset], [file_id], [file_group_id], [path], 
     [duration], [io_data], [succeeded], [sql_text], [trace_message], [source_database_id], 
     [object_id], [object_type], [cpu], [reads], [writes], [state], [offset_end], 
     [nest_level], [activity_id], [event_sequence])
    
    -- Fetch the Event Data from the Event Session Target
    SELECT 
        RowID,
        event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'), 
                 event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
        event_data.value('(event/data[@name="count"]/value)[1]', 'bigint') AS [count],
        event_data.value('(event/data[@name="increment"]/value)[1]', 'bigint') AS [increment],
        event_data.value('(event/data[@name="mode"]/text)[1]', 'nvarchar(4000)') AS [mode],
        event_data.value('(event/data[@name="file_handle"]/value)[1]', 'nvarchar(4000)') AS [file_handle],
        event_data.value('(event/data[@name="offset"]/value)[1]', 'bigint') AS [offset],
        event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') AS [file_id],
        event_data.value('(event/data[@name="file_group_id"]/value)[1]', 'int') AS [file_group_id],
        event_data.value('(event/data[@name="path"]/value)[1]', 'nvarchar(4000)') AS [path],
        event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint') AS [duration],
        event_data.value('(event/data[@name="io_data"]/value)[1]', 'nvarchar(4000)') AS [io_data],
        event_data.value('(event/data[@name="succeeded"]/value)[1]', 'nvarchar(4000)') AS [succeeded],
        event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') AS [sql_text],
        event_data.value('(event/data[@name="message"]/value)[1]', 'nvarchar(4000)') AS [trace_message],
        event_data.value('(event/data[@name="source_database_id"]/value)[1]', 'int') AS [source_database_id],
        event_data.value('(event/data[@name="object_id"]/value)[1]', 'int') AS [object_id],
        event_data.value('(event/data[@name="object_type"]/value)[1]', 'int') AS [object_type],
        event_data.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
        event_data.value('(event/data[@name="reads"]/value)[1]', 'bigint') AS [reads],
        event_data.value('(event/data[@name="writes"]/value)[1]', 'bigint') AS [writes],
        event_data.value('(event/data[@name="state"]/text)[1]', 'nvarchar(4000)') AS [state],
        event_data.value('(event/data[@name="offset_end"]/value)[1]', 'int') AS [offset_end],
        event_data.value('(event/data[@name="nest_level"]/value)[1]', 'int') AS [nest_level],
        CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
        CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence
    FROM RawEventData
    ORDER BY Rowid
    GO
    
    -- Return our results
    SELECT * 
    FROM TestResults
    WHERE event_name NOT IN ('sql_statement_starting' , 'sql_statement_completed')
    ORDER BY RowID

    If you scroll through the results you can see the writes occurring sequentially through each of the log files, and while FileId=4 is being written to, the Database Engine begins the growth of FileId=2 by 32MB.  If we change our query to only focus on the file_write_completed, databases_log_growth, and trace_print Events, we can see this a little easier.

    -- Return our results
    SELECT 
        Rowid, 
        event_name, 
        [timestamp], 
        [count], 
        database_id, 
        mode, 
        offset, 
        file_id, 
        duration, 
        trace_message 
    FROM TestResults
    WHERE event_name  IN ('file_write_completed' , 'databases_log_growth', 'trace_print')
    ORDER BY RowID

    image    image
    Log Rollover from Log1 to Log2   Log Rollover from Log2 to Log3

     

    image

       image
    Autogrow of Log1   Log Rollover from Log3 to Log4

     

    image    image
    Log Rollover from Log4 back to Log1   Autogrow of Log2

     

    image    image
    Autogrow of Log3   Autogrow of Log4


    We can see the first log file, FileID=2, grows before the log rollover from Log3, FileID=4, occurs to Log4, FileID=5, making space available in the first log file for the rollover when FileID=5 becomes full.  Log records are written all four of the log files before the log wraps back around to the first log file, debunking point numbers two and four of the conclusion.  The timestamps of the events shows that the additional log files are written to serially and not at the same time debunking point number three of the conclusion.  The reason that only a fraction of the log records are written to the three additional log files is proportionate to the difference in the autogrowth settings between the first log file at 32MB and the three additional log files at 1MB.  If the first log file was set to grow at 1MB, the majority of the log records would not be in the first log file.

    Its been well documented that there is no performance benefit to having multiple log files in a database, and Paul Randal’s blog post, Importance of proper transaction log size management, was linked to in the original blog post that ultimately triggered this post.  The behavior demonstrated in this post isn’t a mystery, its documented in the Books Online (Transaction Log Physical Architecture), but sometime empirical evidence like this helps solidify that fact. 

  • An XEvent a Day (22 of 31) – The Future – fn_dblog() No More? Tracking Transaction Log Activity in Denali

    I bet that made you look didn’t it?  Worry not, fn_dblog() still exists in SQL Server Denali, and I plan on using it to validate the information being returned by a new Event in SQL Server Denali CTP1, sqlerver.transaction_log, which brings with it the ability to correlate specific transaction log entries to the operations that actually caused them to occur.

    There is no greater source of information about the transaction log in SQL Server than Paul Randal’s blog category Transaction Log.  It is also listed as the referenced pre-reading material for the Microsoft SQL Server MCM program Logging, Recovery, Log Maintenance topic.  In a number of his blog posts, Paul shows how to look at the transaction log by using an undocumented system function fn_dblog().  Note that I said it is undocumented, meaning its use is not supported by Microsoft, its functionality is subject to change at any point in time without notification, and its use is at your own risk.  Is it safe to use?  That’s a topic that is up for debate, but at the end of the day if you were to have a problem associated with its use you are on your own because its undocumented.

    Why does any of this matter?  It matters because there is a lot of information that we can learn about the internal operations of SQL Server from the log operations that occur as the result of changes in our database.  Some examples of this would be:

    A SQL Server DBA myth a day: (19/30) TRUNCATE TABLE is non-logged
    Benchmarking: 1-TB table population (part 2: optimizing log block IO size and how log IO works)
    Lock logging and fast recovery
    How do checkpoints work and what gets logged
    Finding out who dropped a table using the transaction log

    Admittedly this isn’t necessarily information that you would start a conversation with at a party, unless of course you are surrounded by other ubergeeks SQL Server internal junkies, and its not really the kind of information that I use day to day in my work as a Database Administrator.  Prior to the introduction of Extended Events, some information about how SQL Server operated was only available inside of the transaction log records, and I am sure that there are still some items that you can only see inside of the log records.  Microsoft obviously recognized a demand to look the log operations generated by SQL Server in a supported fashion and added this functionality to Extended Events.

    The sqlserver.transaction_log Event returns 10 data elements in its payload, which can be found in the sys.dm_xe_object_columns DMV.

    SELECT 
        oc.name, 
        oc.type_name, 
        oc.description
    FROM sys.dm_xe_packages AS p
    INNER JOIN sys.dm_xe_objects AS o
        ON p.guid = o.package_guid
    INNER JOIN sys.dm_xe_object_columns AS oc
        ON oc.object_name = o.name
            AND oc.object_package_guid = o.package_guid
    WHERE o.name = 'transaction_log'
      AND oc.column_type = 'data'

    image

    The operation and context elements have corresponding Maps in sys.dm_xe_map_values that provide the different Log Operations and Contexts that are available through Extended Events.  There are currently 70 Log Operations and 29 Contexts for those operations available in SQL Server Denali CTP1. 

    SELECT 
        name,
        map_key,
        map_value
    FROM sys.dm_xe_map_values
    WHERE name in ('log_op', 'log_context')
    ORDER BY name, map_key

    To show how this Event can be used, we’ll first create a database named TransactionLogDemo, and then switch our connection to that database.  We’ll then create an table that will be used to generate some Transaction Log events.  We’ll create our Event Session to capture the sqlserver.sql_statement_starting, sqlserver.sql_statement_completed, and sqlserver.transaction_log Events and we’ll add a Predicate to each Event to only fire for the TransactionLogDemo database.  To add the Predicate dynamically, we’ll use Dynamic SQL to create our Event Session since inline parameters cannot be used in the CREATE EVENT SESSION DDL. 

    CREATE DATABASE TransactionLogDemo
    GO
    USE TransactionLogDemo
    GO
    CREATE TABLE CreateLogRecords
    (RowID int identity primary key,
     RowData nvarchar(120))
    GO
    DECLARE @sqlcmd nvarchar(2000) = '
    CREATE EVENT SESSION TransactionLogDemo
    ON SERVER
    ADD EVENT sqlserver.page_reference_tracker,
    ADD EVENT sqlserver.sql_statement_starting
    ( ACTION(sqlserver.sql_text)
      WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')
    ),
    ADD EVENT sqlserver.sql_statement_completed
    ( ACTION(sqlserver.sql_text)
      WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')
    ),
    ADD EVENT sqlserver.transaction_log
    ( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+'))
    ADD TARGET package0.asynchronous_file_target(
         SET filename=''C:\SQLBlog\TransactionLogDemoDenali.xel'',
             metadatafile=''C:\SQLBlog\TransactionLogDemoDenali.xem'')
    WITH (MAX_MEMORY = 4MB, EVENT_RETENTION_MODE = NO_EVENT_LOSS, TRACK_CAUSALITY = ON )'
    EXEC (@sqlcmd)
    GO
    CHECKPOINT
    GO
    -- Start the Event Session
    ALTER EVENT SESSION TransactionLogDemo
    ON SERVER
    STATE=START
    GO

    Once the Event Session is created, we’ll call CHECKPOINT on the database so that the log can truncate (you did create the database in SIMPLE recovery right?) and clear allowing our later call to fn_dblog() to only return the log records specific to the operations that occur after the CHECKPOINT.  We’ll start our Event Session, and then insert 20 rows into the CreateLogRecords table, and then immediately delete all of the rows from the table, and stop our Event Session to end the collection of Events.

    INSERT INTO CreateLogRecords (RowData)
    SELECT REPLICATE('abc123', 20)
    FROM master.dbo.spt_values a
    WHERE a.type = 'P'
      AND a.number < 20
    GO
    DELETE CreateLogRecords
    GO
    -- Disable the Event Session
    ALTER EVENT SESSION TransactionLogDemo
    ON SERVER
    STATE=STOP
    GO

    Once this is done, we can now query the package0.asynchronous_file_target to get our Event data from Extended Events, and then at the same time query fn_dblog() to get the log records from the Transaction Log as well so that we can validate what we’ve collected in our Event Session.

    -- Fetch the Event Data from the Event Session Target
    SELECT 
        XEvent.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                XEvent.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        COALESCE(XEvent.value('(event/data[@name="database_id"]/value)[1]', 'int'), 
                 XEvent.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
        XEvent.value('(event/data[@name="index_id"]/value)[1]', 'int') AS [index_id],
        XEvent.value('(event/data[@name="object_id"]/value)[1]', 'int') AS [object_id],
        XEvent.value('(event/data[@name="transaction_id"]/value)[1]', 'int') AS [transaction_id],
        XEvent.value('(event/data[@name="log_record_size"]/value)[1]', 'int') AS [log_record_size],
        XEvent.value('(event/data[@name="operation"]/text)[1]', 'varchar(50)') AS [operation],
        XEvent.value('(event/data[@name="context"]/text)[1]', 'varchar(50)') AS [context],
        XEvent.value('(event/data[@name="transaction_start_time"]/value)[1]', 'datetime2') AS [transaction_start_time],
        XEvent.value('(event/data[@name="replication_command"]/value)[1]', 'int') AS [replication_command],
        XEvent.value('(event/action[@name="sql_text"]/value)[1]', 'varchar(1000)') AS [sql_text],
        CAST(SUBSTRING(XEvent.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
        CAST(SUBSTRING(XEvent.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence
    FROM (
        SELECT CAST(event_data AS XML) AS XEvent
        FROM sys.fn_xe_file_target_read_file('C:\SQLBlog\TransactionLogDemoDenali*.xel', 'C:\SQLBlog\TransactionLogDemoDenali*.xem', null, null)) as src
    ORDER BY 
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                XEvent.value('(event/@timestamp)[1]', 'datetime2')),
        CAST(SUBSTRING(XEvent.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier),
        CAST(SUBSTRING(XEvent.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int)
    GO
    
    -- Fetch Log records from log for comparison
    SELECT 
        [Xact ID] as transaction_id, 
        [Log Record Fixed Length] as log_record_size, 
        [Operation] as operation, 
        SUBSTRING([context], 5, LEN(context)) as context, 
        [Begin Time]
    FROM fn_dblog(null, null) 
    GO

    image

    When we look at the output of the above two queries, first you’ll note that there are four log records that don’t have associated records in our Extended Event Session.  These are the log records generated by the CHECKPOINT operation and the ALTER EVENT SESSION command and occurred before the Event Session was actually collecting data.  The first two LOP_BEGIN_XACT records in the Event Session correspond to the transaction_id of the log records returned in rows 5 and 6 of the fn_dblog() output, but if you notice the Event Session is missing the transaction_start_time for the log operations, something I believe to be a bug in Denali CTP1 and which I’ve submitted a Connect item for (Denali - Transaction_Log Extended Event Returning Incorrect Data). 

    On quick glance it appears that all of our log records are in the same order, but if we look at more closely, there is a LOP_MODIFY_ROW operation that is missing from our Event Session, but exists inside of the fn_dblog() output.

    image

    If you scroll down further, you’ll also see that there are two missing log records for the delete as well, LOP_MODIFY_HEAP and LOP_SET_BITS with a context of PFS.  However, the Extended Event Session captures the lock logging for the ALTER EVENT SESSION command that stopped the Event collection, whereas the output from fn_dblog() does not show that.

    To cleanup from this Event Session, the following code can be run.

    USE master
    GO
    DROP DATABASE TransactionLogDemo
    GO
    DROP EVENT SESSION TransactionLogDemo
    ON SERVER
    GO
    EXECUTE sp_configure 'show advanced options', 1
    GO
    RECONFIGURE
    GO
    EXECUTE sp_configure 'xp_cmdshell', 1
    GO
    RECONFIGURE
    GO
    EXEC xp_cmdshell 'DEL C:\SQLBlog\TransactionLogDemoDenali*'
    GO
    EXECUTE sp_configure 'xp_cmdshell', 0
    GO
    RECONFIGURE
    GO
    EXECUTE sp_configure 'show advanced options', 0
    GO
    RECONFIGURE
    GO
  • An XEvent a Day (21 of 31) – The Future – Tracking Blocking in Denali

    One of my favorite features that was added to SQL Server 2005 has been the Blocked Process Report trace event which collects an XML report whenever a process is blocked inside of the database engine longer than the user configurable threshold.  I wrote an article about this feature on SQL Server Central  two years ago titled Using the Blocked Process Report in SQL Server 2005/2008.  One of the aspects of this feature is that it requires that you either have a SQL Trace running that captures the event, or you configure Event Notifications on the server to capture the event information in a Service Broker Queue, neither of which is overly difficult to accomplish.  In SQL Server Denali CTP1, there is a new Extended Event, sqlserver.blocked_process_report, that makes this even easier to use.  We can now create an Event Session that exists in our SQL Server and is waiting to be started to capture blocked process information as needed.  However, we still have to set the ‘blocked process threshold’ sp_configure option to set the threshold at which blocked process report information is generated by Database Engine, firing the Event in our Event Session.

    CREATE EVENT SESSION MonitorBlocking
    ON SERVER
    ADD EVENT sqlserver.blocked_process_report
    ADD TARGET package0.ring_buffer(SET MAX_MEMORY=2048)
    WITH (MAX_DISPATCH_LATENCY = 5SECONDS)
    GO
    ALTER EVENT SESSION MonitorBlocking
    ON SERVER
    STATE=START
    GO
    EXECUTE sp_configure 'show advanced options', 1
    GO
    RECONFIGURE
    GO
    EXECUTE sp_configure 'blocked process threshold', 15
    GO
    RECONFIGURE
    GO
    EXECUTE sp_configure 'show advanced options', 0
    GO
    RECONFIGURE
    GO

    To test this Event Session, we can open two New Query Windows in SSMS and connect them to our database engine.  Then in one window run the following code:

    USE [tempdb]
    GO
    CREATE TABLE t1 (RowID int identity primary key)
    GO
    BEGIN TRANSACTION
    INSERT INTO t1 DEFAULT VALUES
    WAITFOR DELAY '00:00:30'
    COMMIT

    Then in the other window run the following code:

    USE [tempdb]
    GO
    SELECT * FROM t1

    The first query will block the execution of the second query until it completes, generating our blocked process report in the ring_buffer Target for our Event Session.  To get the information from the ring_buffer target, we can run a quick XQuery to parse the Target (at this point in the series, you should be an XQuery pro):

    -- Query the XML to get the Target Data
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        ISNULL(n.value('(event/data[@name="database_id"]/value)[1]', 'int'),
                n.value('(event/action[@name="database_id"]/value)[1]', 'int')) as [database_id],
        n.value('(event/data[@name="database_name"]/value)[1]', 'nvarchar(128)') as [database_name],
        n.value('(event/data[@name="object_id"]/value)[1]', 'int') as [object_id],
        n.value('(event/data[@name="index_id"]/value)[1]', 'int') as [index_id],
        CAST(n.value('(event/data[@name="duration"]/value)[1]', 'bigint')/1000000.0 AS decimal(6,2)) as [duration_seconds],
        n.value('(event/data[@name="lock_mode"]/text)[1]', 'nvarchar(10)') as [file_handle],
        n.value('(event/data[@name="transaction_id"]/value)[1]', 'bigint') as [transaction_id],
        n.value('(event/data[@name="resource_owner_type"]/text)[1]', 'nvarchar(10)') as [resource_owner_type],
        CAST(n.value('(event/data[@name="blocked_process"]/value)[1]', 'nvarchar(max)') as XML) as [blocked_process_report]
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s    
            JOIN sys.dm_xe_session_targets AS t
                ON s.address = t.event_session_address
            WHERE s.name = 'MonitorBlocking'
              AND t.target_name = 'ring_buffer'
        ) AS sub
        CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
    ) as tab
    GO

    image

    The output of the Event in Extended Events contains some additional information to the blocked process report in XML format including the database_id, object_id, index_id, duration, lock_mode, transaction_id, and resource_owner_type for the blocking resource.  Clicking on the blocked process report XML will open it up in Management Studio as an XML document allowing detailed analysis of the blocking to be performed.

    <blocked-process-report>
      <blocked-process>
        <process id="process2eb8bda8" taskpriority="0" logused="0" waitresource="KEY: 2:2666130980878942208 (61a06abd401c)" 
                 waittime="25480" ownerId="12748" transactionname="SELECT" lasttranstarted="2010-12-21T18:19:03.263" 
                 XDES="0x2dfb9c10" lockMode="S" schedulerid="1" kpid="2484" status="suspended" spid="60" sbid="0" ecid="0" 
                 priority="0" trancount="0" lastbatchstarted="2010-12-21T18:19:03.263" 
                 lastbatchcompleted="2010-12-21T18:19:03.263" clientapp="Microsoft SQL Server Management Studio - Query" 
                 hostname="WIN-5B9V8JPLP3H" hostpid="2708" loginname="WIN-5B9V8JPLP3H\Administrator" 
                 isolationlevel="read committed (2)" xactid="12748" currentdb="2" lockTimeout="4294967295" 
                 clientoption1="671090784" clientoption2="390200">
          <executionStack>
            <frame line="1" sqlhandle="0x02000000d9de7b2f4f3a78e40f100bc02a84efbb9f01a84d" />
          </executionStack>
          <inputbuf>
    SELECT * FROM t1   </inputbuf>
        </process>
      </blocked-process>
      <blocking-process>
        <process status="suspended" waittime="27430" spid="57" sbid="0" ecid="0" priority="0" trancount="1" 
                 lastbatchstarted="2010-12-21T18:19:01.437" lastbatchcompleted="2010-12-21T18:13:25.637" 
                 clientapp="Microsoft SQL Server Management Studio - Query" hostname="WIN-5B9V8JPLP3H" 
                 hostpid="2708" loginname="WIN-5B9V8JPLP3H\Administrator" isolationlevel="read committed (2)" 
                 xactid="12733" currentdb="2" lockTimeout="4294967295" clientoption1="671090784" clientoption2="390200">
          <executionStack>
            <frame line="3" stmtstart="100" stmtend="150" sqlhandle="0x020000005a74b3030117e049389a93b2ce5bfb48e272f938" />
          </executionStack>
          <inputbuf>
    BEGIN TRANSACTION
    INSERT INTO t1 DEFAULT VALUES
    WAITFOR DELAY '00:00:30'
    COMMIT   </inputbuf>
        </process>
      </blocking-process>
    </blocked-process-report>

    The output of the blocked process report in Extended Events is the same as the blocked process report from SQL Trace and Event Notifications.  This is only a new mechanism of collecting this information, and when Denali releases RTM, this will be an Event Session that I install on my Denali based servers as a part of my configuration scripts, having it ready to activate when necessary.

    Make sure that you turn off the generation of blocked process reports by changing the ‘blocked process threshold’ sp_configure option back to 0 (zero) whenever you are not actively looking for blocked processes.

  • An XEvent a Day (20 of 31) – Mapping Extended Events to SQL Trace

    One of the biggest problems that I had with getting into Extended Events was mapping the Events available in Extended Events to the Events that I knew from SQL Trace.  With so many Events to choose from in Extended Events, and a different organization of the Events, it is really easy to get lost when trying to find things.  Add to this the fact that Event names don’t match up to Trace Event names in SQL Server 2008 and 2008 R2, and not all of the Events from Trace are implemented in SQL Server 2008 and 2008 R2, and it gets really confusing really fast.  For my presentation this year at SQLBits 7, I sat down with Excel and mapped out the Events that exist in Extended Events to their corresponding SQL Trace Event, and two of the slides in my deck were tables of these mappings. 

    TraceCategory TraceEvent PackageName XEEventName
    Deprecation Deprecation Announcement sqlserver deprecation_announcement
    Deprecation Deprecation Final Support sqlserver deprecation_final_support
    Errors and Warnings ErrorLog sqlserver errorlog_written
    Errors and Warnings EventLog sqlserver error_reported
    Errors and Warnings Exception sqlos exception_ring_buffer_recorded
    Errors and Warnings User Error Message sqlserver error_reported
    Full text FT:Crawl Aborted sqlserver error_reported
    Locks Deadlock graph sqlserver xml_deadlock_report
    Locks Lock:Acquired sqlserver lock_acquired
    Locks Lock:Deadlock sqlserver lock_deadlock
    Locks Lock:Released sqlserver lock_released
    Locks Lock:Timeout sqlserver locks_lock_timeouts
    Locks Lock:Timeout (timeout > 0) sqlserver locks_lock_timeout_greater_than_0
    Stored Procedures RPC Output Parameter sqlserver rpc_completed
    Stored Procedures RPC:Completed sqlserver rpc_completed
    Stored Procedures RPC:Starting sqlserver rpc_starting
    Stored Procedures SP:Completed sqlserver module_end
    Stored Procedures SP:Recompile sqlserver sp_statement_starting
    Stored Procedures SP:Starting sqlserver module_start
    Stored Procedures SP:StmtCompleted sqlserver sp_statement_completed
    Stored Procedures SP:StmtStarting sqlserver sp_statement_starting
    TSQL SQL:StmtCompleted sqlserver sql_statement_completed
    TSQL SQL:StmtRecompile sqlserver sql_statement_starting
    TSQL SQL:StmtStarting sqlserver sql_statement_starting
    User configurable UserConfigurable sqlserver user_settable

    I have a script that creates a view for these in my administrative database, that is a part of my SQL Server 2008 configuration script for my environment. That script is attached to this blog post for use in your own environment.

    In SQL Server Denali CTP1, a table has been added to the master database named dbo.trace_xe_event_map that provides a static mapping of each Trace Event to its corresponding Extended Events Event (This table should be in the sys schema to maintain uniformity in the product and I filed a Connect Feedback to move this that needs votes to have this changed).  Mike Wachal blogged about this table and how it and its partner table dbo.trace_xe_action_map can be used to migrate from SQL Trace to Extended Events in his blog post Migrating from SQL Trace to Extended Events.  He also includes a really cool SQLCLR Stored Procedure that will perform the conversion for you automagically. 

  • An XEvent a Day (19 of 31) – Using Customizable Fields

    Today’s post will be somewhat short, but we’ll look at Customizable Fields on Events in Extended Events and how they are used to collect additional information.  Customizable Fields generally represent information of potential interest that may be expensive to collect, and is therefore made available for collection if specified by the Event Session.  In SQL Server 2008 and 2008 R2, there are 50 Events that have customizable columns in their payload.  In SQL Server Denali CTP1, there are 124 Events that have customizable columns in their payload. The customizable columns and Events that have them can be found with the following query.

    SELECT 
        p.name AS package_name,
        o.name AS event_name,
        oc.name AS column_name,
        oc.column_type,
        oc.type_name,
        oc.description
    FROM sys.dm_xe_packages p
    JOIN sys.dm_xe_objects o
        ON p.guid = o.package_guid
    JOIN sys.dm_xe_object_columns oc
        ON o.name = oc.object_name 
            AND o.package_guid = oc.object_package_guid
    WHERE ((p.capabilities is null or p.capabilities & 1 = 0)
      AND (o.capabilities is null or o.capabilities & 1 = 0)
      AND (oc.capabilities is null or oc.capabilities & 1 = 0))
      AND o.object_type = 'event'
      AND oc.column_type = 'customizable'
    

    If we look at a specific Event containing a customizable column, in this case the sqlserver.file_read_completed Event, we will see that there is a customizable column as well as a data column for the data collected by the customizable column.

    SELECT 
        p.name AS package_name,
        o.name AS event_name,
        oc.column_id,
        oc.name AS column_name,
        oc.column_type,
        oc.type_name,
        oc.description
    FROM sys.dm_xe_packages p
    JOIN sys.dm_xe_objects o
        ON p.guid = o.package_guid
    JOIN sys.dm_xe_object_columns oc
        ON o.name = oc.object_name 
            AND o.package_guid = oc.object_package_guid
    WHERE ((p.capabilities is null or p.capabilities & 1 = 0)
      AND (o.capabilities is null or o.capabilities & 1 = 0)
      AND (oc.capabilities is null or oc.capabilities & 1 = 0))
      AND o.object_type = 'event'
      AND o.name = 'file_read_completed'
      AND oc.column_type <> 'readonly'
    ORDER BY oc.column_type, oc.column_id
    image

    In the red box are the customizable columns, and the blue box has the associated data columns to the customizable columns.  The data columns will exist in the Event data from the Event firing, but they will only have a value in the Event data if the customizable column is set to collect the information.

    CREATE EVENT SESSION CustomizableColumnDemo
    ON SERVER
    ADD EVENT sqlserver.file_read_completed
    (
        WHERE (database_id = 4)    
    )
    ADD TARGET package0.ring_buffer
    GO
    ALTER EVENT SESSION CustomizableColumnDemo
    ON SERVER
    STATE=START
    GO
    DBCC DROPCLEANBUFFERS
    GO
    SELECT TOP 10 * FROM msdb.dbo.backupset
    GO
    

    If we query the Target data for the above Event, we’ll see that the path and io_data columns are included in the Event XML, but there is no value in the XML nodes.

    SELECT CAST(target_data AS XML) as target_data
    FROM sys.dm_xe_sessions AS s    
    JOIN sys.dm_xe_session_targets AS t
        ON s.address = t.event_session_address
    WHERE s.name = 'CustomizableColumnDemo'
      AND t.target_name = 'ring_buffer'
    

    <event name="file_read_completed" package="sqlserver" id="83" version="1" timestamp="2010-12-20T03:14:20.393Z">
      <data name="mode">
        <type name="file_io_mode" package="sqlserver" />
        <value>0</value>
        <text>Contiguous</text>
      </data>
      <data name="duration">
        <type name="uint64" package="package0" />
        <value>0</value>
        <text />
      </data>
      <data name="file_handle">
        <type name="ulong_ptr" package="package0" />
        <value>0x0000000000000b38</value>
        <text />
      </data>
      <data name="offset">
        <type name="uint64" package="package0" />
        <value>14352384</value>
        <text />
      </data>
      <data name="database_id">
        <type name="uint16" package="package0" />
        <value>4</value>
        <text />
      </data>
      <data name="file_id">
        <type name="uint16" package="package0" />
        <value>1</value>
        <text />
      </data>
      <data name="file_group_id">
        <type name="uint16" package="package0" />
        <value>1</value>
        <text />
      </data>
      <data name="path">
        <type name="unicode_string" package="package0" />
        <value />
        <text />
      </data>
      <data name="io_data">
        <type name="binary_data" package="package0" />
        <value />
        <text />
      </data>
    </event>

    To set the customizable column to collect the data, in the ADD EVENT section of the CREATE EVENT SESSION or ALTER EVENT SESSION DDL command, the SET option is used to turn data collection on for the column.

    DROP EVENT SESSION CustomizableColumnDemo
    ON SERVER
    GO
    CREATE EVENT SESSION CustomizableColumnDemo
    ON SERVER
    ADD EVENT sqlserver.file_read_completed
    (    
        SET collect_path = 1
        WHERE(database_id = 4)
    )
    ADD TARGET package0.ring_buffer
    GO
    ALTER EVENT SESSION CustomizableColumnDemo
    ON SERVER
    STATE=START
    GO
    DBCC DROPCLEANBUFFERS
    GO
    SELECT TOP 10 * FROM msdb.dbo.backupset
    GO
    ALTER EVENT SESSION CustomizableColumnDemo
    ON SERVER
    DROP EVENT sqlserver.file_read_completed
    GO

    Notice that the SET option does not use parenthesis, they are not allowed in the DDL definition.  By setting the collect_path customizable column to 1 the Event XML now contains the path to the data file that was read.

    -- Query the XML to get the Target Data
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        ISNULL(n.value('(event/data[@name="database_id"]/value)[1]', 'int'),
                n.value('(event/action[@name="database_id"]/value)[1]', 'int')) as [database_id],
        n.value('(event/data[@name="mode"]/value)[1]', 'nvarchar(50)') as [mode],
        n.value('(event/data[@name="duration"]/value)[1]', 'bigint') as [duration],
        n.value('(event/data[@name="file_handle"]/value)[1]', 'nvarchar(50)') as [file_handle],
        n.value('(event/data[@name="offset"]/value)[1]', 'int') as [offset],
        n.value('(event/data[@name="file_id"]/value)[1]', 'int') as [file_id],
        n.value('(event/data[@name="path"]/value)[1]', 'nvarchar(250)') as [path],
        n.value('(event/data[@name="id_data"]/value)[1]', 'nvarchar(max)') as [io_data]   
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s    
            JOIN sys.dm_xe_session_targets AS t
                ON s.address = t.event_session_address
            WHERE s.name = 'CustomizableColumnDemo'
              AND t.target_name = 'ring_buffer'
        ) AS sub
        CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
    ) as tab
    GO

    image

    The increase in the number of Events with customizable columns in Denali CTP1 is, in my own opinion, a great step in the right direction for Extended Events.  The use of customizable columns to add data into the Event payload extends the flexibility of Extended Events by providing a mechanism to gather additional data related to Events that is specific to the Event and not globally available like Actions.

  • An XEvent a Day (18 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 2)

    In yesterday’s blog post A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 1), we looked at what happens when we Backup a database in SQL Server.  Today, we are going to use the information we captured to perform some analysis of the Backup information in an attempt to find ways to decrease the time it takes to backup a database.  When I began reviewing the data from the Backup in yesterdays post, I realized that I had made a mistake in the process and left Trace Flag 3213 off, which left some information that we’ll need out of the results, the Backup Buffer Configuration information.  For this post I turned Trace Flag 3213 on, and reran the Backup from yesterday, so the results may differ slightly but the same Backup command was used for the tests.

    image
    The Backup Buffer Configuration information tells us how many Buffers were allocated, and what the Max Transfer Size being used is.  In the screenshot above, this information is outlined in the red box.  The default Buffer Count is determined by SQL Server when the BUFFERCOUNT option is not specified in the BACKUP DATABASE command.  The calculation used is:

    (NumberofBackupDevices * GetSuggestedIoDepth) + NumberofBackupDevices + (2*DatabaseDeviceCount)

    This is covered in detail on Amit Banerjee’s blog post, Incorrect BufferCount data transfer option can lead to OOM condition.  For the Backup in yesterday’s post, and the one that will be the baseline for today’s post, the BUFFERCOUNT option was not specified, and the Backup, as shown above, used 7 Buffers and the default MaxTransferSize of 1MB for the backup.  If we look at the aggregated Event and wait_type information contained in the Event Session for the Backup we can begin to see what types of Backup bottlenecks we might have in the system.

    SELECT  
        ISNULL(wait_type, event_name) as Operation, 
        SUM(duration) as duration, 
        SUM(signal_duration) as signal_duration, 
        COUNT(*) as occurences
    FROM #TestResults
    WHERE (duration IS NOT NULL OR signal_duration IS NOT NULL)
      AND ISNULL(wait_type, event_name) IN 
            ('BACKUPBUFFER', 'BACKUPIO')
    GROUP BY ISNULL(wait_type, event_name)
    image

    Looking at this information, we have a large number of BACKUPBUFFER waits occurring during the backup of the database and this may be a potential tuning opportunity to improve the performance of our database Backups.  To test this, we can change our Backup Script to include a 4GB MAXTRANSFERSIZE and a BUFFERCOUNT of 16.  I also chose to change the filenames in the Extended Events Session to simplify working with the Target Data for each test individually.  It is possible to specify the exact filename and metadatafile names when you read from the Target, but that requires more work than is needed in my opinion.  For the sake of brevity I am not going to repeat all of the Extended Events information in this blog post, but instead show the outcome of running various configurations of BUFFERCOUNT against a test server.

    Test Number Backup File Count Buffer Count Max Transfer Size Backup Time (s) BACKUPBUFFER (wait ms) BACKUPIO (wait ms) BACKUPBUFFER (wait count) BACKUPIO (wait count)
    1 1 7 1024 122.5 159471 62587 81616 22815
    2 1 16 4096 105.2 90963 69091 14513 7982
    3 1 32 4096 99.5 75236 88634 12298 8679
    4 1 128 4096 95.9 70173 63435 8292 4679
    5 1 256 4096 95.9 50988 48942 1538 1135
    6 2 128 4096 96 152323 63800 12416 4925
    7 2 256 4096 96.4 109565 46953 3067 1195

    The same Event Session was used to gather the above metrics in seven repetitive tests.  The test server is a Dell R710 dual quad core system with 24GB RAM and HT enabled.  It has eight internal 15K RPM 146GB SAS drives that are configured into 2 RAID 1 drive pairs and a single 4 disk RAID 5 array.  One of the RAID 1 drive pairs was dedicated to the OS, and SQL Binaries.  The other was used for writing the backups, and the database data/log/tempdb files were placed on the RAID 5 array.  This isn’t the ideal configuration for a setup, but its what I had available at the moment to work with, and is similar to some of the configurations I’ve seen in the real world as well.  The Backups were segregated to a dedicated disk that was RAID 1 for this test to avoid RAID 5 write penalties, and to maximize the backup write throughput by isolating it from any other operations.

    The above results can be interpreted a number of different ways.  As the BUFFERCOUNT increases the backup time decreases, and so does the amount of time spent waiting on Backup Buffers.  However, there is a tradeoff that is being made in memory consumption; specifically memory outside of the buffer pool from Virtual Address Space.  On 32 bit servers this can lead to Out of Memory exceptions, the topic of Amit’s blog post referenced previously in this blog post.  Test Number 5, with 256 buffers and a 4MB transfer size will use 1GB of memory as shown in the Backup Buffer Configuration Information.

    image

    On the test server used for this testing, the bottleneck is the disks that the backup file is being written to and further improvements in performance will require additional IO to accomplish.  The test database has 30GB of data in it, and with backup compression, the backup size is 7.8GB in size on disk.  For a full backup to take just over a minute and a half for this database is not that bad, but it is going to local disks and there is no safeguard for the data in the event of a catastrophic loss of the physical server entirely unless the data gets copied to another location in the network after a local backup occurs.

  • An XEvent a Day (17 of 31) – A Look at Backup Internals and How to Track Backup and Restore Throughput (Part 1)

    Today’s post is a continuation of yesterday’s post How Many Checkpoints are Issued During a Full Backup? and the investigation of Database Engine Internals with Extended Events.  In today’s post we’ll look at how Backup’s work inside of SQL Server and how to track the throughput of Backup and Restore operations.  This post is not going to cover Backups in SQL Server as a topic; if that is what you are looking for see Paul Randal’s TechNet Article Understanding SQL Server Backups.

    Yesterday I mentioned that there is only one Event in Extended Events that has the word backup in it's name, and that Event is the sqlserver.backup_and_restore_throughput Event.  At first glance this Event looks pretty dull.  It only returns three columns, database_id, count, and increment, and doesn’t really tell us what count and increment mean in the metadata.

    -- Get the Event columns
    SELECT 
        object_name,
        name, 
        type_name,
        description
    FROM sys.dm_xe_object_columns
    WHERE object_name LIKE '%backup%'
      AND column_type = 'data'

    image

    I could step you through what I did to look at this Event and figure out the meaning of things, but that would make an already long post longer.  Essentially I created an Event Session with just this Event and used the sqlserver.session_id Predicate to only capture it for a specific session_id that I was going to run a FULL backup from.  The count column is the total number of bytes that have been written to backups and the increment column is the current number of bytes that were written when the Event fired (we’ll see this more in a minute).  This was interesting to see so I started thinking about what kind of information I would want to know about Backups that related to the throughput and two items came to mind almost immediately; read operations from the database, and wait statistics related to the Backup occurring, both of which are available through Extended Events.  I also recalled that there were a few documented Trace Flags associated with Backup and Restore operations that output more verbose information through Trace Prints.  Trace Flag 3004, outputs what operations Backup and Restore are performing (How It Works: What is Restore/Backup Doing?).  Trace Flag 3213, outputs the Backup Buffer configuration information as discussed on the SQLCAT team blog series Tuning the Performance of Backup Compression in SQL Server 2008 and Tuning Backup Compression Part 2.  Trace Flag 3014, outputs additional information about Backup and File operations (How It Works: How does SQL Server Backup and Restore select transfer sizes).  There happens to be a sqlserver.trace_print Event that can capture the trace output as a part of our Event Session.

    Using yesterday’s post as a foundation for the Event Session in today’s post, and the same Sample_Reporting Database, lets look at the Event Session that we’ll use to investigate Backups.

    -- Create the Event Session
    CREATE EVENT SESSION BackupMonitoring
    ON SERVER
    ADD EVENT sqlserver.sql_statement_starting
    (   ACTION (sqlserver.database_id, sqlserver.sql_text)
        WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlserver.sql_statement_completed
    (   ACTION (sqlserver.database_id, sqlserver.sql_text)
        WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlserver.databases_backup_restore_throughput
    (   WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlos.wait_info
    (   ACTION (sqlserver.database_id) 
        WHERE (sqlserver.session_id = 97  AND duration > 0)),
    ADD EVENT sqlos.wait_info_external
    (   ACTION (sqlserver.database_id) 
        WHERE (sqlserver.session_id = 97  AND duration > 0)),
    ADD EVENT sqlserver.trace_print
    (   WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlserver.file_read
    (   WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlserver.file_read_completed
    (   WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlserver.physical_page_read
    (   WHERE (sqlserver.session_id = 97)),
    ADD EVENT sqlserver.databases_log_cache_read
    (   WHERE (database_id = 41)),
    ADD EVENT sqlserver.databases_log_cache_hit
    (   WHERE (database_id = 41)),
    ADD EVENT sqlserver.databases_log_flush
    (   WHERE (database_id = 41)),
    ADD EVENT sqlserver.checkpoint_begin
    (   WHERE (database_id = 41)),
    ADD EVENT sqlserver.checkpoint_end
    (   WHERE (database_id = 41))
    ADD TARGET package0.asynchronous_file_target(
         SET filename='C:\SQLBlog\BackupMonitoring1.xel',
             metadatafile = 'C:\SQLBlog\BackupMonitoring1.xem')
    GO
    -- Alter the Session to Start it
    ALTER EVENT SESSION BackupMonitoring
    ON SERVER
    STATE=START
    GO
    

     

    There is a lot of information being collected in this Event Session.  We are going to get the sql_statement_starting and completed Events, the backup_restore_throughput Event, wait_info Event for SQLOS waits inside of SQL Server and the wait_info_external Event for preemptive waits outside of SQL Server, the trace_print Event to capture our Trace Flag outputs, the file_read, file_read_completed, and physical_page_read Events to capture read operations from the session_id performing the Backup, the database_log_cache_read, database_log_cache_hit, and database_log_flush Events to track transaction log cache operations during the Backup, and the checkpoint_begin and checkpoint_end Events to track checkpoint occurrence during the backup and how they might impact throughput.  If you notice, some of the Events are Predicated on the session_id, while others are predicated on the database_id, and this is very intentional in the definition of this Event Session.  Some Events do not fire in the context of a specific database_id, and some Events do not fire in the context of a specific session_id, and some will fire for both.  Where the database_id is a practical Predicate for the Event, and it is carried in the Events base payload, it is a natural item to use for a Predicate.  Restricting Events to a specific database_id or session_id will prevent Event capture from other operations occurring on the SQL Server.

    With our Event Session defined and started, we can now run a Backup of the database and see what we capture.  I am going to show two different Backup configurations in this post, based on the information contained in the SQLCAT series on tuning Backup Performance in SQL Server 2008.  The first one uses a default configuration for the BUFFERCOUNT and MAXTRANSFERSIZE Backup options, but also uses Database Compression since it is available to minimize he backup file size, and maximize the throughput of the backup operation.

    BACKUP DATABASE [Sample_Reporting] 
    TO  DISK = N'B:\SQLData\MSSQL10.MSSQLSERVER\MSSQL\Backup\Sample_Reporting1.bak' 
    WITH NOFORMAT, 
        NOINIT,  
        NAME = N'Sample_Reporting-Full Database Backup Number 1', 
        SKIP, 
        NOREWIND, 
        NOUNLOAD, 
        COMPRESSION, 
        STATS = 5
    GO

    On this server, the backups are writing to a dedicated RAID1 disk array using two 146GB 15K RPM SAS drives.  When the backup completes we can begin our analysis of the Events captured by our Event Session.  To make it possible to perform various types of analysis of the data contained inside of the asynchronous_file_target, I am going to read the Raw XML Event data into a temporary table, and then shred the XML into a second temporary table, making it possible to just query the shredded data.

    DROP TABLE #EventData
    DROP TABLE #TestResults
    
    -- Create intermediate temp table for raw event data
    CREATE TABLE #EventData
    (Rowid int identity primary key, event_data xml)
    
    -- Create final results table for parsed event data
    CREATE TABLE #TestResults
    (Rowid int primary key, event_name varchar(50), package_name varchar(50),
     [timestamp] datetime2, database_id int, trace_print nvarchar(4000),
     [count] bigint, increment bigint, wait_type nvarchar(100), opcode nvarchar(10),
     duration bigint, max_duration bigint, total_duration bigint, signal_duration bigint,
     completed_count bigint, source_database_id int, [object_id] int, object_type int,
     [state] nvarchar(50), offset bigint, offset_end int, nest_level int, cpu int,
     reads bigint, writes bigint, mode nvarchar(50), file_id int, page_id int,
     file_group_id int, sql_text nvarchar(4000))
    
    -- Read the file data into intermediate temp table
    INSERT INTO #EventData (event_data)
    SELECT
        CAST(event_data AS XML) AS event_data
    FROM sys.fn_xe_file_target_read_file('C:\SQLBlog\BackupMonitoring1*.xel', 'C:\SQLBlog\BackupMonitoring1*xem', null, null)
    
    -- Query the Event data from the Target.
    INSERT INTO #TestResults
    (Rowid, event_name, package_name, [timestamp], database_id, trace_print,
     [count], increment, wait_type, opcode, duration, max_duration, total_duration, 
     signal_duration, completed_count, source_database_id, [object_id], object_type,
     [state], offset, offset_end, nest_level, cpu,  reads, writes, mode, file_id,
     page_id, file_group_id, sql_text)
     
    SELECT 
        RowID,
        event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        event_data.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'),
                    event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) as database_id,
        event_data.value('(event/data[@name="message"]/value)[1]', 'nvarchar(4000)') as trace_print,
        event_data.value('(event/data[@name="count"]/value)[1]', 'bigint')  AS [count],
        event_data.value('(event/data[@name="increment"]/value)[1]', 'bigint')  AS [increment],
        event_data.value('(event/data[@name="wait_type"]/text)[1]', 'nvarchar(100)') as wait_type,
        event_data.value('(event/data[@name="opcode"]/text)[1]', 'nvarchar(10)') as opcode,
        event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint')  AS duration,
        event_data.value('(event/data[@name="max_duration"]/value)[1]', 'bigint')  AS max_duration,
        event_data.value('(event/data[@name="total_duration"]/value)[1]', 'bigint')  AS total_duration,
        event_data.value('(event/data[@name="signal_duration"]/value)[1]', 'bigint')  AS signal_duration,
        event_data.value('(event/data[@name="completed_count"]/value)[1]', 'bigint')  AS completed_count,
        event_data.value('(event/data[@name="source_database_id"]/value)[1]', 'int')  AS source_database_id,
        event_data.value('(event/data[@name="object_id"]/value)[1]', 'int')  AS object_id,
        event_data.value('(event/data[@name="object_type"]/value)[1]', 'int')  AS object_type,
        event_data.value('(event/data[@name="state"]/text)[1]', 'nvarchar(50)') as state,
        event_data.value('(event/data[@name="offset"]/value)[1]', 'bigint')  AS offset,
        event_data.value('(event/data[@name="offset_end"]/value)[1]', 'int')  AS offset_end,
        event_data.value('(event/data[@name="nest_level"]/value)[1]', 'int')  AS nest_level,    
        event_data.value('(event/data[@name="cpu"]/value)[1]', 'int')  AS cpu,    
        event_data.value('(event/data[@name="reads"]/value)[1]', 'bigint')  AS reads,
        event_data.value('(event/data[@name="writes"]/value)[1]', 'bigint')  AS writes,
        event_data.value('(event/data[@name="mode"]/text)[1]', 'nvarchar(50)') as mode,
        event_data.value('(event/data[@name="file_id"]/value)[1]', 'int')  AS file_id,
        event_data.value('(event/data[@name="page_id"]/value)[1]', 'int')  AS page_id,
        event_data.value('(event/data[@name="file_group_id"]/value)[1]', 'int')  AS file_group_id,        
        event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') as sql_text
    FROM #EventData
    ORDER BY Rowid
    
    -- Look at the Results.
    SELECT 
        Rowid,
        event_name,
        database_id, 
        trace_print, 
        [count], 
        increment, 
        wait_type, 
        duration, 
        signal_duration, 
        cpu, 
        reads, 
        writes, 
        mode, 
        file_id, 
        page_id, 
        file_group_id, 
        sql_text
    FROM #TestResults
    ORDER BY Rowid

    In the above query, I am extracting all of the data elements from the Event data, even though in the final query I am not using all of the data.  I did this to have a complete example of how to shred the XML, and because we are storing it in a temp table, we may find that we want to come back and look at specific data elements that were excluded in the initial look at the results.  From our results we can begin to understand how Backup Operations work inside of SQL Server.

    image

    Here we can see the statement starting to execute, and the first output from the trace_print Event showing that the backup of the database was starting, along with the external waits associated with performing file operations to create a Backup file for the database.

    image

    Here the newly created Backup file is opened an 1K of write occur to the file before it becomes ready for the Backup.

     

    image

    Here we can see two operations being performed.  In the red outlined box, since we are doing a full Backup of the database, the differential bitmaps, pages that track which extents in a GAM interval have been modified since the last full or differential backup (Inside the Storage Engine: Anatomy of an extent), are cleared.  In the blue outlined box, we see the checkpoint triggered by the Backup operation begin, and in the four highlighted boxes in grey, we see two physical_page_reads occur for the database, one from the transaction log, and one from the primary data file.  These are the pages that are written to when Checkpoint occurs in the database.  The file_id 2 page_id 0 page is where the Log Sequence Number is written to the log file, and file_id 1 page_id 9 is the database boot page, where the LSN is also written to at checkpoint (Checkpoints and the Active Portion of the Log).  We also see the wait_info event for the PAGEIOLATCH_UP wait to update of this information.

    image

    Next the allocation bitmaps for the database are scanned and an estimate of the work is performed (red box) before writing 5K of metadata into the backup file. 

    image

    At this point the Backup process is ready to begin copying data to the Backup file.  Since this particular database only has 1 data file, only one reader is assigned to the backup (Optimising Backup & Restore Performance in SQL Server).  When the Backup starts on the file, an additional 1024 bytes (1K of information is written to the Backup file and the file read operations against the database data file(s) commences.

    image

    As the backup of the data file data begins, we see a change in the size of the increment being written to the Backup file, and now we have 1MB segments of data being written to the file.

    image
    We can also see that multiple 1MB segments are written within milliseconds of each other.  Now I could spend a lot of time running through the entire set of Events showing the same thing, but with 187,698 Events for a 110 second Backup, that would take forever.  Instead I am going to skip over all the interim file reads and Backup file writes and get to the end of the data file section.

    image

    Highlighted in black above, we see the completion of the first data file, followed by a trace_print event, in red, Padding MSDA with 196608 bytes (192K of space), and then the trace_print event, in blue, showing the completion of all Database files, which is also the beginning of the Transaction log portion of the backup.

    image

    Here we can see that the size of the log information being backed up, highlighted in black, is significantly different from the data file information which is to be expected since log records are very different from data records in composition.  When the log files done trace_print Event in red, and the trailing configuration writres in blue and the trace_print Event in orange marking the completion of the trailing configuration.

    image

    I don’t know what MBC done means, and I couldn’t find it online, but it completed here.  I think it might stand for MaxBufferCount, and the above line shows that all of the buffers have been written out for the backup. (Don’t quote me on that I am just taking a SWAG there!)

    image

    After the MBC is done, the backup history records are written into MSDB.

    image 

    And finally the backup completes.  So far all that we’ve done is look at all of the information that we can get, and there is a lot of it, but unless we can do something actionable with all this information, there is no real point in gathering it.  I originally intended to only cover one post on this subject but its become quite large, so I am splitting it into two posts and in tomorrow’s post we’ll look at how we can use the information captured in today’s post to validate whether or not changes to our backup process have a positive or negative impact on backup times and throughput.

  • An XEvent a Day (16 of 31) – How Many Checkpoints are Issued During a Full Backup?

    This wasn’t my intended blog post for today, but last night a question came across #SQLHelp on Twitter from Varun (Twitter).

    #sqlhelp how many checkpoints are issued during a full backup?

    The question was answered by Robert Davis (Blog|Twitter) as:

    Just 1, at the very start. RT @1sql: #sqlhelp how many checkpoints are issued during a full backup?

    This seemed like a great thing to test out with Extended Events so I ran through the available Events in SQL Server 2008, and the only Event related to Backup is the sqlserver.databases_backup_restore_throughput Event, something which is a topic for another blog post, but that doesn’t matter because we can still do testing of this by using the Events available in Extended Events.  The sqlserver.sql_statement_starting, sqlserver.sql_statement_completed, sqlserver.checkpoint_begin and sqlserver.checkpoint_end Events can be used to test this with appropriate Predicate definitions.

    To test this I used a copy of two databases on a development server.  One is a source database and the second is a reporting database.  I also duplicated the ETL process that extracts data from a source database and transforms it into the reporting schema so that I could test this under a workload that would be changing data and should cause checkpoints to occur inside of the reporting database.  Then I queried sys.databases (ok I actually used DB_ID(‘Sample_Reporting’)) to get the database_id for the Sample_Reporting database to use in the Predicate for the sqlserver.checkpoint_begin and sqlserver.checkpoint_end Events. 

    image

    Then I opened a new Query Window in SSMS and used that connections session_id in the Predicate for the sqlserver.sql_statement_starting and sqlserver.sql_statement_completed Events in the Event Session.  The result was the following Session definition.

    -- Create the Event Session
    CREATE EVENT SESSION BackupCheckPoints
    ON SERVER
    ADD EVENT sqlserver.sql_statement_starting
    (    ACTION (sqlserver.database_id, sqlserver.sql_text)
        WHERE (sqlserver.session_id = 113)),
    ADD EVENT sqlserver.sql_statement_completed
    (    ACTION (sqlserver.database_id, sqlserver.sql_text)
        WHERE (sqlserver.session_id = 113)),
    ADD EVENT sqlserver.checkpoint_begin
    (    WHERE (database_id= 41)),
    ADD EVENT sqlserver.checkpoint_end
    (    WHERE (database_id = 41))
    ADD TARGET package0.ring_buffer
    GO
    -- Alter the Session to Start it
    ALTER EVENT SESSION BackupCheckpoints
    ON SERVER
    STATE=START
    GO

    With the Event Session started, I then started a FULL backup of the Sample Reporting database, followed by starting the ETL processes.  When the FULL backup completed I dropped the Events from the Event Session so that no further Event collection occurred.

    -- Drop Events to halt Event collection
    ALTER EVENT SESSION BackupCheckPoints
    ON SERVER
    DROP EVENT sqlserver.sql_statement_starting,
    DROP EVENT sqlserver.sql_statement_completed,
    DROP EVENT sqlserver.checkpoint_begin,
    DROP EVENT sqlserver.checkpoint_end

    Now we can query the ring_buffer Target and see what has occurred during the FULL backup of the Sample_Reporting database.

    -- Query the XML to get the Target Data
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        ISNULL(n.value('(event/data[@name="database_id"]/value)[1]', 'int'),
                n.value('(event/action[@name="database_id"]/value)[1]', 'int')) as [database_id],
        n.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') as [sql_text]
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s 
            JOIN sys.dm_xe_session_targets AS t 
                ON t.event_session_address = s.address
            WHERE s.name = 'BackupCheckpoints'
              AND t.target_name = 'ring_buffer'
        ) AS sub
        CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
    ) as tab
    GO

    image

    As you can see in the above screenshot, multiple checkpoints can occur during a FULL backup of a database in SQL Server 2008.  According to Paul Randal, “Checkpoints exist for two reasons—to batch up write I/Os to improve performance and to reduce the time required for crash recovery” (http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx).  Since we are continuing to make changes to the data inside of the system while the FULL backup occurs, there is a continued need for CHECKPOINT’s to occur for the database.

  • An XEvent a Day (15 of 31) – Tracking Ghost Cleanup

    If you don’t know anything about Ghost Cleanup, I recommend highly that you go read Paul Randal’s blog posts Inside the Storage Engine: Ghost cleanup in depth, Ghost cleanup redux, and Turning off the ghost cleanup task for a performance gain.  To my knowledge Paul’s posts are the only things that cover Ghost Cleanup at any level online.

    In this post we’ll look at how you can use Extended Events to track the activity of Ghost Cleanup inside of your SQL Server.  To do this, we’ll first take a look at the ghost_cleanup Event and what it returns.

    -- Find the Event
    SELECT 
        p.name, 
        o.name, 
        o.description
    FROM sys.dm_xe_packages AS p
    JOIN sys.dm_xe_objects AS o
        ON p.guid = o.package_guid
    WHERE o.name = 'ghost_cleanup'
    
    -- Get the data columns for the Event
    SELECT 
        name, 
        TYPE_NAME
    FROM sys.dm_xe_object_columns
    WHERE OBJECT_NAME = 'ghost_cleanup'
      AND column_type = 'data'

    image

    The ghost_cleanup Event is in the sqlserver Package and returns the file_id and page_id that the ghost cleanup process is working on.  Since most SQL Servers have multiple databases, we probably will want to track the database_id through an Action as well.  Since ghost cleanup is a background process, and we don’t know much about how it works, or how many Events are going to be generated, we could start off with the synchronous_event_counter Target, but in this case, I just want to capture Events and all of them, so we will just go with the asynchronous_file_target.

    CREATE EVENT SESSION TrackGhostCleanup
    ON SERVER
    ADD EVENT sqlserver.ghost_cleanup
    ( ACTION(sqlserver.database_id))
    ADD TARGET package0.asynchronous_file_target(
         SET filename='C:\SQLBlog\TrackGhostCleanup.xel',
             metadatafile='C:\SQLBlog\TrackGhostCleanup.xem')
    WITH (MAX_MEMORY = 4MB, EVENT_RETENTION_MODE = NO_EVENT_LOSS )
    GO
    ALTER EVENT SESSION TrackGhostCleanup
    ON SERVER
    STATE=START

    This is a really basic Event Session, it captures one Event, sqlserver.ghost_cleanup and collects the sqlserver.database_id Action when the Event fires.  The Event data is captured by the package0.asynchronous_file_target, and the Event Session is configured to not allow Event Loss.  After the starting the Event Session and allowing it to run, we can query the files for the captured events and see how ghost_cleanup is running on our instance.

    -- Query the Event data from the Target.
    SELECT 
        event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        event_data.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        event_data.value('(event/@id)[1]', 'int') AS id,
        event_data.value('(event/@version)[1]', 'int') AS version,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        event_data.value('(event/action[@name="database_id"]/value)[1]', 'int') as database_id,
        event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') as file_id,
        event_data.value('(event/data[@name="page_id"]/value)[1]', 'int') as page_id
    FROM 
    (SELECT
        CAST(event_data AS XML) AS event_data
     FROM sys.fn_xe_file_target_read_file('C:\SQLBlog\TrackGhostCleanup*.xel', 'C:\SQLBlog\TrackGhostCleanup*xem', null, null)
    ) as tab

    From around 15 minutes of runtime on one of my development servers, over 17.5K Events have fired, much more than I initially anticipated, and after nearly 30 minutes of runtime, I had just over 37K Events. 

    image

    Some interesting information can be found in the Events.  In SQL Server 2008, the Ghost Cleanup process runs every 10 seconds, just as Paul has documented in his blog posts, which was a change from every 5 seconds in SQL Server 2005.

    image

    The process in 2008 cleans up 200 pages at a time, something Paul hasn’t specifically blogged about for SQL Server 2008.  Before anyone debates this, Paul’s statement “It will check through or cleanup a limited number of pages each time it wakes up - I remember the limit is 10 pages - to ensure it doesn't swamp the system.” from his Inside the Storage Engine: Ghost cleanup in depth is based on SQL Server 2005, which also ran ghost cleanup every 5 seconds instead of 10 seconds.  We can look at the Event information over subsequent 10 second intervals and see that 200 pages are cleaned up each time ghost_cleanup runs.

    image

    image

    This has to be one of my favorite aspects of Extended Events.  You get to really learn about SQL Server Internals by just playing with SQL Server.  I have a couple more blog posts that show how you can learn about SQL Server Internals using Extended Events for this series, and if you are interested in a previous post on the subject check out TSQL Tuesday #11 – Physical IO’s Don’t Always Accumulate Wait Times.

    Until tomorrow….

  • An XEvent a Day (14 of 31) – A Closer Look at Predicates

    When working with SQL Trace, one of my biggest frustrations has been the limitations that exist in filtering.  Using sp_trace_setfilter to establish the filter criteria is a non-trivial task, and it falls short of being able to deliver complex filtering that is sometimes needed to simplify analysis.  Filtering of trace data was performed globally and applied to the trace affecting all of the events being collected.  Extended Events introduces a much better system of filtering using Predicates that are applied at the individual Event level, allow for short circuiting of evaluation, and provide the ability to create complex groups of independent criteria, ensuring only Events of interest are captured by the Event Session.

    In yesterdays post, The system_health Session, I talked about the default system_health session that is running on every SQL Server 2008/2008R2 and Denali CTP1 instance out of the box.  The Predicate definition for the sqlos.wait_info event in the system_health session is a good example to follow for complex, short-circuiting Predicate definition in Extended Events.

    ADD EVENT sqlos.wait_info
    (
        ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text)
        WHERE 
        (duration > 15000 AND 
            (    
                (wait_type > 31    -- Waits for latches and important wait resources (not locks) 
                                -- that have exceeded 15 seconds. 
                    AND
                    (
                        (wait_type > 47 AND wait_type < 54)
                        OR wait_type < 38
                        OR (wait_type > 63 AND wait_type < 70)
                        OR (wait_type > 96 AND wait_type < 100)
                        OR (wait_type = 107)
                        OR (wait_type = 113)
                        OR (wait_type > 174 AND wait_type < 179)
                        OR (wait_type = 186)
                        OR (wait_type = 207)
                        OR (wait_type = 269)
                        OR (wait_type = 283)
                        OR (wait_type = 284)
                    )
                )
                OR 
                (duration > 30000        -- Waits for locks that have exceeded 30 secs.
                    AND wait_type < 22
                ) 
            )
        )
    ),

    Since Predicates perform short-circuit evaluation, where the criteria groups are evaluated in order and the first failure in the criteria causes the Predicate evaluation to stop and preventing the Event from being fired in the engine, the order of the criteria can directly impact the performance of an Event Session.  If we look at the definition for the sqlos.wait_info Event, the first Predicate criteria specifies that the duration of the wait has to be greater than 15 seconds.  Since the majority of waits in SQL Server generally occur with durations less than 15 seconds, the Predicate evaluations shortcut immediately and the Event does not fire.  If the wait exceeds the 15 second duration, the evaluation continues and checks that the wait_type matches one of defined values.  How do we know what these values are? 

    When looking at an Event, all of the columns have a type_name associated with them that can be found in the sys.dm_xe_object_columns DMV as previously discussed in this series.  If we take a look at the type_name for the wait_info Event wait_type column, we’ll see that it has a type of wait_types.

    SELECT 
        name, 
        type_name, 
        column_type
    FROM sys.dm_xe_object_columns
    WHERE object_name = 'wait_info'
      AND column_type <> 'readonly'
    image

    When a column has a non-standard type_name like this, it corresponds to a Map that has been loaded in Extended Events.  We can find a list of the wait_types that the Event will fire for by querying the sys.dm_xe_map_values DMV for the map_keys defined in the Event Session:

    SELECT map_key, map_value
    FROM sys.dm_xe_map_values
    WHERE name = 'wait_types'
      AND 
        (map_key > 31    -- Waits for latches and important wait resources (not locks) 
                        -- that have exceeded 15 seconds. 
            AND
            (
                (map_key > 47 AND map_key < 54)
                OR map_key < 38
                OR (map_key > 63 AND map_key < 70)
                OR (map_key > 96 AND map_key < 100)
                OR (map_key = 107)
                OR (map_key = 113)
                OR (map_key > 174 AND map_key < 179)
                OR (map_key = 186)
                OR (map_key = 207)
                OR (map_key = 269)
                OR (map_key = 283)
                OR (map_key = 284)
            )
        )
    

    The wait_types that correspond to the first complex grouping are:

    map_key map_value
    32 LATCH_NL
    33 LATCH_KP
    34 LATCH_SH
    35 LATCH_UP
    36 LATCH_EX
    37 LATCH_DT
    48 PAGELATCH_NL
    49 PAGELATCH_KP
    50 PAGELATCH_SH
    51 PAGELATCH_UP
    52 PAGELATCH_EX
    53 PAGELATCH_DT
    64 PAGEIOLATCH_NL
    65 PAGEIOLATCH_KP
    66 PAGEIOLATCH_SH
    67 PAGEIOLATCH_UP
     
    map_key map_value
    68 PAGEIOLATCH_EX
    69 PAGEIOLATCH_DT
    97 IO_COMPLETION
    98 ASYNC_IO_COMPLETION
    99 NETWORK_IO
    107 RESOURCE_SEMAPHORE
    113 SOS_WORKER
    175 FCB_REPLICA_WRITE
    176 FCB_REPLICA_READ
    177 HOLDER11
    178 WRITELOG
    186 CMEMTHREAD
    207 TRACEWRITE
    269 RESOURCE_SEMAPHORE_MUTEX
    283 RESOURCE_SEMAPHORE_QUERY_COMPILE
    284 RESOURCE_SEMAPHORE_SMALL_QUERY

    If you look at the way the Predicate is defined, it is much closer to how you’d write a WHERE clause with complex filtering criteria, allowing groups of specific criteria to be defined within sets of parenthesis’s that are evaluated together, something that was impossible with SQL Trace.

    In addition to being able to define Predicates based on the Event columns returned by an Event, it is possible to also define Predicates on the global state data available in the Extended Events Engine.  If you’ll recall, the global state predicates are available in the sys.dm_xe_objects DMV as pred_source object_type’s.

    SELECT 
        p.name AS package_name,
        o.name AS predicate_name,
        o.description
    FROM sys.dm_xe_packages AS p
    INNER JOIN sys.dm_xe_objects AS o
        ON p.guid = o.package_guid
    WHERE (p.capabilities IS NULL OR p.capabilities & 1 = 0)
      AND o.object_type = 'pred_source'
      image

    Two of the predicate sources are special, the package0.counter and package0.partitioned_counter, and can be used to restrict the number of occurrences of an Events that are captured by an Event Session.  The following demonstration creates an Event Session that captures the first five occurrences of the sqlserver.sql_statement_completed Event, and then executes six statements in sequence.  When the target_data is queried the last statement SELECT @@SPID is not included in the results.

    CREATE EVENT SESSION CounterPredicateDemo
    ON SERVER
    ADD EVENT sqlserver.sql_statement_completed
    ( ACTION (sqlserver.sql_text)
      WHERE (package0.counter <=5))
    ADD TARGET package0.ring_buffer
    WITH (MAX_DISPATCH_LATENCY = 1 SECONDS)
    GO
    ALTER EVENT SESSION CounterPredicateDemo
    ON SERVER
    STATE = START
    GO
    SELECT @@VERSION
    GO
    SELECT @@SERVERNAME
    GO
    SELECT @@SPID
    GO
    SELECT @@VERSION
    GO
    SELECT @@SERVERNAME
    GO
    SELECT @@SPID
    GO
    ALTER EVENT SESSION CounterPredicateDemo
    ON SERVER
    DROP EVENT sqlserver.sql_statement_completed
    GO
    -- Wait in a delay for Events to Buffer
    WAITFOR DELAY '00:00:05'
    GO
    
    -- Query the XML to get the Target Data
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],  
        n.value('(event/data[@name="object_id"]/value)[1]', 'int') AS [object_id],
        n.value('(event/data[@name="object_type"]/value)[1]', 'nvarchar(128)') AS [object_type],
        n.value('(event/data[@name="duration"]/value)[1]', 'int') AS [duration],
        n.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
        n.value('(event/data[@name="reads"]/value)[1]', 'int') AS [reads],
        n.value('(event/data[@name="writes"]/value)[1]', 'int') AS [writes],
        n.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text]
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s 
            JOIN sys.dm_xe_session_targets AS t 
                ON t.event_session_address = s.address
            WHERE s.name = 'CounterPredicateDemo'
              AND t.target_name = 'ring_buffer'
        ) AS sub
        CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
    ) as tab
    GO
    -- Drop the Event Session
    DROP EVENT SESSION CounterPredicateDemo
    ON SERVER

    image

    This capabilities behind Predicate definition in Extended Events makes it much more flexible, and powerful for troubleshooting than SQL Trace.  It also makes Extended Events much more performant than Trace by preempting Event firing for Events that are not of interest.

  • An XEvent a Day (13 of 31) – The system_health Session

    Today’s post was originally planned for this coming weekend, but seems I’ve caught whatever bug my kids had over the weekend so I am changing up today’s blog post with one that is easier to cover and shorter.  If you’ve been running some of the queries from the posts in this series, you have no doubt come across an Event Session running on your server with the name of system_health.  In today’s post I’ll go over this session and provide links to references related to it.

    When Extended Events was introduced in SQL Server 2008, the Produce Support Services group worked with the Extended Events developers to create the definition for an Event Session that could be shipped with SQL Server 2008, would startup automatically when SQL Server starts up, and contained Events of interest in troubleshooting common problems seen by the PSS Engineers.  Bob Ward (Blog|Twitter) blogged about the details of the system_health session that shipped with SQL Server 2008 in his blog post Supporting SQL Server 2008: The system_health session.  The script for this Event Session is inside of the utables.sql script file that is in the instance Install folder (for example c:\Program Files\Microsoft SQL Server\<InstanceDesignator>\MSSQL\Install) and can be used to recreate the Event Session if you inadvertently change it.

    -- The predicates in this session have been carefully crafted to minimize impact of event collection
    -- Changing the predicate definition may impact system performance
    --
    CREATE EVENT SESSION system_health ON SERVER
    • The sql_text and session_id for any sessions that encounter an error that has a severity >=20.
    • The sql_text and session_id for any sessions that encounter a memory-related error. The errors include 17803, 701, 802, 8645, 8651, 8657 and 8902.
    ADD EVENT sqlserver.error_reported
    (
        ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text, sqlserver.tsql_stack)
        -- Get callstack, SPID, and query for all high severity errors ( above sev 20 )
        WHERE severity >= 20
        -- Get callstack, SPID, and query for OOM errors ( 17803 , 701 , 802 , 8645 , 8651 , 8657 , 8902 )
        OR (ERROR = 17803 OR ERROR = 701 OR ERROR = 802 OR ERROR = 8645 OR ERROR = 8651 OR ERROR = 8657 OR ERROR = 8902)
    ),
    • A record of any non-yielding scheduler problems. (These appear in the SQL Server error log as error 17883.)
    ADD EVENT sqlos.scheduler_monitor_non_yielding_ring_buffer_recorded,
    • Any deadlocks that are detected.
    ADD EVENT sqlserver.xml_deadlock_report,
    • The callstack, sql_text, and session_id for any sessions that have waited on latches (or other interesting resources) for > 15 seconds.
    ADD EVENT sqlos.wait_info
    (
        ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text)
        WHERE 
        (duration > 15000 AND 
            (    
                (wait_type > 31    -- Waits for latches and important wait resources (not locks ) that have exceeded 15 seconds. 
                    AND
                    (
                        (wait_type > 47 AND wait_type < 54)
                        OR wait_type < 38
                        OR (wait_type > 63 AND wait_type < 70)
                        OR (wait_type > 96 AND wait_type < 100)
                        OR (wait_type = 107)
                        OR (wait_type = 113)
                        OR (wait_type > 174 AND wait_type < 179)
                        OR (wait_type = 186)
                        OR (wait_type = 207)
                        OR (wait_type = 269)
                        OR (wait_type = 283)
                        OR (wait_type = 284)
                    )
                )
    • The callstack, sql_text, and session_id for any sessions that have waited on locks for > 30 seconds.
                OR 
                (duration > 30000        -- Waits for locks that have exceeded 30 secs.
                    AND wait_type < 22
                ) 
            )
        )
    ),
    • The callstack, sql_text, and session_id for any sessions that have waited for a long time for preemptive waits. The duration varies by wait type. A preemptive wait is where SQL Server is waiting for external API calls.
    ADD EVENT sqlos.wait_info_external
    (
        ACTION (package0.callstack, sqlserver.session_id, sqlserver.sql_text)
        WHERE 
        (duration > 5000 AND
            (   
                (    -- Login related preemptive waits that have exceeded 5 seconds.
                    (wait_type > 365 AND wait_type < 372)
                    OR    (wait_type > 372 AND wait_type < 377)
                    OR    (wait_type > 377 AND wait_type < 383)
                    OR    (wait_type > 420 AND wait_type < 424)
                    OR    (wait_type > 426 AND wait_type < 432)
                    OR    (wait_type > 432 AND wait_type < 435)
                )
                OR 
                (duration > 45000     -- Preemptive OS waits that have exceeded 45 seconds. 
                    AND 
                    (    
                        (wait_type > 382 AND wait_type < 386)
                        OR    (wait_type > 423 AND wait_type < 427)
                        OR    (wait_type > 434 AND wait_type < 437)
                        OR    (wait_type > 442 AND wait_type < 451)
                        OR    (wait_type > 451 AND wait_type < 473)
                        OR    (wait_type > 484 AND wait_type < 499)
                        OR wait_type = 365
                        OR wait_type = 372
                        OR wait_type = 377
                        OR wait_type = 387
                        OR wait_type = 432
                        OR wait_type = 502
                    )
                )
            )
        )
    )
    • Capture Event information using the ring_buffer target.
    ADD target package0.ring_buffer        -- Store events in the ring buffer target
        (SET max_memory = 4096)
    • Set the session to start automatically with SQL Server
    WITH (startup_state = ON)
    GO

    In SQL Server Denali CTP1, two new Events have been added to the system_health session specific to SQLCLR.

    • A SQLCLR memory allocation failed.
    ADD EVENT sqlclr.allocation_failure,
    • A SQLCLR virtual memory allocation failed.
    ADD EVENT sqlclr.virtual_alloc_failure,

    While the system_health session captures very useful information, it uses the ring_buffer Target to store the Event data.  In a scenario where the database engine fails completely the information that may have been captured by the system_health session will be lost when the process terminates.  Also since the Event Session uses the ring_buffer Target, it is possible that you may not receive back all of the Event data contained in the target, or the Events that you might have expected to exist.  Bob Ward talked about the limitation of the DMV’s to return 4MB of XML data and how this impacts the in memory Targets in Extended Events in his blog post You may not see the data you expect in Extended Event Ring Buffer Targets…. 

    One of my favorite aspects of the system_health session is that it includes deadlock tracing through Extended Events by default.  However, in order to make use of the deadlock graph captured by Extended Events, you have to be on CU6 for SQL Server 2008 SP1 (http://support.microsoft.com/kb/978629), or you could try to hack your way around the bug as I showed in my article Retrieving Deadlock Graphs with SQL Server 2008 Extended Events, and Michael Zilberstein’s update to correct a problem with the code in that article, Parsing Extended Events xml_deadlock_report.  The deadlock graph in Extended Events will not open graphically in SSMS like a SQL Trace XML Deadlock Graph will due to changes in its output to support multi-victim deadlocks, which I covered in my blog post Changes to the Deadlock Monitor for the Extended Events xml_deadlock_report and Multi-Victim Deadlocks.

  • An XEvent a Day (12 of 31) – Using the Extended Events SSMS Addin

    The lack of SSMS support for Extended Events, coupled with the fact that a number of the existing Events in SQL Trace were not implemented in SQL Server 2008, has no doubt been a key factor in its slow adoption rate.  Since the release of SQL Server Denali CTP1, I have already seen a number of blog posts that talk about the introduction of Extended Events in SQL Server, because there is now a stub for it inside of SSMS.  Don’t get excited yet, the functionality in CTP1 is very limited at this point, but the demonstration of what’s to come at the PASS keynote last month showed that more effort has been put into this than is currently available in CTP1, and we should see further improvements in coming releases.  To make working with Extended Events easier, two years ago I worked on a management UI that eventually turned into a SSMS Addin for Extended Events.  In this blog post I’ll go over the features of the SSMS Addin and how it can be used to work with Extended Events.

    Features

    The Extended Events Manager exists in two forms, a SSMS Addin, and a standalone Winforms application.  The project started out with the Winforms application, and has continued to maintain that as a development environment for functionality changes and new additions to the project, since debugging changes inside of the SSMS Addin proved more difficult than inside of the Winforms application.  This blog post will focus on the functionality of the Addin which matches the functionality of the stand alone applications with only a few difference that relate to the stand alone application not being a part of SSMS.  Currently the Addin only supports SQL Server 2008 and 2008R2, but changes are underway to allow it to support Denali as well.  T

    Automatic Update Notifications

    The Extended Events SSMS Addin will check for an Internet connection when SSMS starts up, and validate whether or not the latest build is being used.  If a new release is available, then a message box similar to the one below will be displayed.

    image

    By clicking the “Yes” button on the dialog box, Internet Explorer will be launched and will open the CodePlex project site for the application where you can then download the newest release for use.  The Automatic Update notifications can be turned off by unchecking the box on the popup, or turning off the feature in the Option Manager.

    Using the Extended Events Manager

    Using the Extended Events Manager is very simple, and a majority of the functionality was implemented to match the way functionality is exposed inside of SSMS already.  The Addin is available from the View menu and opens up the Event Session Explorer Window which can be docked like the Object Explorer inside of SSMS.  The Event Session Explorer will show the Extended Events Session information for the currently selected Server in the Object Explorer if the server is a SQL Server 2008/2008R2 instance, and automatically changes its Server connection when the selected server in the Object Explorer changes. 

    Event Session Explorer Window

    The Event Session Explorer displays a list of all the Event Sessions that exist inside of the current SQL Server Session Definition Catalogs.  Each Event Session node will show a list of Events and Targets configured in the Event Session. 

    image

    A majority of the functionality provided by the Extended Events Manager is available through context menu’s in the Event Session Explorer.  There are three different context menu’s available, the Server Level Menu, the Session Level Menu, and the Target Level Menu.

    Server Level Menu

    image

    • New Event Session

      The New Event Session Menu Item will open up the Event Session Editor with a blank Session form.

    • Refresh

      The Refresh Menu Item does exactly what its name implies and refreshes the Session Metadata held by the Tree View from the SQL Server.

    • View XEvent MetaData

      The View XEvent MetaData Menu Item loads the available Extended Events MetaData from the Extended Events Engine on the Server and displays it in the XEvent Metadata Viewer. The data is categorized by Package with SubNodes for each of the object types supported by the Extended Events Engine on the currently selected Server.

    • Addin Options

      The Addin Options Menu Item opens up the Option Manager for the Addin allowing configuration of the Options available.

    Session Level Menu

    image

    •  

      Edit Event Session
      The Edit Event Session Menu Item will open up the Event Session Editor with the currently selected Event Sessions metadata bound to the form.

       

    •  

      Drop Event Session
      The Drop Event Session Menu Item will immediately execute a DROP EVENT SESSION command against the database server for the currently selected Event Session.

       

    • Script Session As
      The Script Session As Menu Item provides scripting functionality similar to that provided by Management Studio for the currently selected Event Session. Currently a script can be created to a file or to the system Clipboard. A future release will also provide the ability to script to a new SQL Server Management Studio window.

       

    •  

      Start/Stop Event Session
      The Start Event Session and Stop Event Session Menu Items are active based on the current status of the Event Session on the Server. If the Session is started, then the start option is disabled and the stop option is enabled. Both issue the appropriate ALTER EVENT SESSION command immediately against the SQL Server for the selected Event Session.

       

    Target Level Menu

    image

  •  

    View Target Data

    The View Target Data Menu Item will open up the Target Data View and display the information contained within the currently selected target.

     

  •  

    Drop Event Session

    The Query Target Data with TSQL Menu Item will make use of Adam Machanic’s Code Generator to generate a TSQL command that can be used to query the target data from the server in a SSMS Query Window.

     

  • Metadata Viewer

    The Metadata Viewer allows browsing of the Extended Events Metadata contained inside of the Metadata DMV’s in SQL Server.

    image

    1. Tabbed Document Support

      The Metadata Viewer provides standard Tabbed document functionality inside of SSMS.

    2. MetaData Tree View

      Contains a list of the existing Event Sessions defined on the Server. The session definition can be seen by expanding the sub nodes under the session.

    3. Additional Information Box

      Will contain any additional information contained in the Event Session Metadata for the selected node.

    Event Session Editor

    The core functionality provided by the Extended Events Manager is the ability to create and edit Event Sessions without having to explicitly write TSQL.  The Event Session Editor allow full configuration of an Event Session including the ability to Add/Remove/Edit Events and Targets, and specify the Session Level Options.

    Main Window

    image

    The main window of the Event Session Editor exposes a majority of the information about the Event Session.  There are nine areas of interest in the Session Editor.

    1. Script Action

      image

      The Script Action button on the toolbar provides the same scripting functionality available in many of the windows natively available in SSMS including the ability to script changes to a New SSMS Window, a File, or the Clipboard.  An addition option is to script the changes to a Message Box popup, which was a debugging feature that was never intended to make it into a released build, since it has not intrinsic value over the other options in practical usage, but since it is there I figured I would mention it here.

    2. Session Name

      The name of the Extended Event Session.

    3. Add Event

      Opens the Event Editor UI for a new Event to be added to the Event Session.

      image
    4. Start Event Session Automatically

      Specifies that the Event Session should be altered to a started state when the Ok button is clicked.

    5. Event Grid View

      Displays current Events defined for the Event Session.  Direct editing of Actions and Predicate definitions can be performed by typing in the appropriate cell for the Event to be changed.  Alternatively, the entire Event definition can be edited by clicking the Edit button which will open the Event Editor and bind the Event definition to the Editor.

    6. Targets Grid View

      Displays current Targets defined for the Event Session.  Direct editing of Target Option definitions can be performed by typing in the appropriate cell for the Target to be changed.  Alternatively, the entire Target definition can be edited by clicking the Edit button which will open the Target Editor and bind the Targets definition to the Editor.

    7. Add Target

      Opens the Target Editor UI for a new Target to be added to the Event Session.

      image

    8. Ok/Cancel Buttons

      The Ok button will parse the Session to a script for execution against the SQL Server to change or create the Event Session.  Exceptions are handled and offer to display the specific errors, including line numbers with the Session Script for troubleshooting.  Should there be an unresolved issue it might be best to script the operation to SSMS and perform troubleshooting directly with the DDL command.  In this case, please provide the DDL script in error as a bug on the Codeplex site for the project.  The Cancel Button will close the Session Editor discarding all changes.

    9. Session Option Editor

      image

      The Session Option Editor displays all of the options for the Event Session.

    Target Data Viewer

    The Target Data Viewer shreds the Event Data for all of the Targets currently available in Extended Events and pivots the data so that it can be displayed in a Profiler like table in SSMS.  The target information is displayed at the top, the Event data held inside the target is displayed in the Gridview, and information about the selected row in the Gridview is displayed in the text window below it.  The target data can be incrementally refreshed from the server by clicking the Refresh button.

    image 

    Extended Event Manager Options

    The Extended Event Manager has two configurable options that control whether or not the Addin checks for updates at startup, and whether or not the Addin provides a warning before attempting to DROP an active Event Session from the current server.

    image

    In addition to the Addin specific options, the Addin also offers the ability to set personalized defaults for the Event Session Options that are used every time a new Event Session is created using the Addin.  This can simplify the customization of Event Sessions if you prefer to set Event Session Options that differ from the defaults provided by Extended Events.

    image 

    Using the Library with Powershell

    The Extended Events Manager has its own SMO like library DLL that is the foundation of all of the functionality available inside of the Addin and the stand alone application named XEvents.Library.dll.  This assembly can be used with Powershell to extend management of Extended Events to Powershell, by loading the library using Reflection:

    [reflection.assembly]::LoadFrom("C:\Program Files\SQL Server Extended Events\Extended Events SSMS Addin\XEvents.Library.dll")

    Once the assembly is loaded, a connection to a SQL Server can be made

    $Server = new-object ("XEvents.Library.SqlServer") "ServerName”

    From here, the common PowerShell commands apply. To see all the members of the Server:

    $Server | get-member

    To view the Sessions currently in the Extended Events Engine:

    $Server.EventSessions | % {$_.Name}

    Background on the Project

    When I first read about this new feature in SQL Server 2008 I tried looking into it, but I really got nowhere because there wasn’t a lot of information about it.  At the time only one blog popped up with anything to do with Extended Events in search engines.  Bob Beauchemin of SQL Skills had blogged about Extended Events as early as the July CTP of SQL Server 2008, back in August 2007.  Nearly a year later I started really working on learning Extended Events and like most people I had trouble wrapping my head around all the DMV’s Catalog Views, and learning what was actually available through Extended Events.  I wrote a really nasty XML generator that queried the DMV’s and provided back a XML representation of the Metadata that root nodes for each Package and then child nodes for each of the types of Objects contained in each Package, all the way down to the object column level.  This made the Metadata a bit more digestible, but after a few days of expanding and contracting nodes I set out to create a basic UI that made working with Extended Events easier.

    When I completed the first build of the UI and had tested things to the point that I considered it to be “stable” I posted the project and source code up on CodePlex and sent an email off to Bob asking for some feedback on the project.  Bob provided a ton of feedback early on in the projects development and blogged about it shortly after the initial build was released in August of 2008.  I spent some time reworking pieces of the project here and there, and as I published newer builds of the project, I sent it out to a couple of other people that I had begun talking/blogging about Extended Events, and one of those people, my good friend Kevin Kline (Blog|Twitter) recommended that I enter the application in the SQL Server 2008 Heroes contest that was being put on by Microsoft in conjunction with that year’s PASS Summit.  When it was picked as a finalist and won, I was surprised because at the time it was just something I was playing around with.

    After PASS 2008 I got an email from Adam Machanic (Blog|Twitter) asking if I would be interested in some help with the project to move it from being a basic project into a completed tool.  Many of the changes that ensued over the next year with the code base, including a complete rewrite of the underlying Library DLL came about because of email exchanges with Adam and the suggestions he provided.  However, the one thing I really disliked about the Extended Event Manager was that it was at this point still a stand alone application, and I really hated having to open another application to play with Extended Events.  Over this period of time I had been trading emails with Jerome Halmans at Microsoft, constantly asking questions about Extended Events and sending him updates whenever I published a new build of the UI.  At some point he mentioned that it would be great if it could be integrated into SSMS directly, so I set out to figure out how to integrate the UI functionality in SQL Server Management Studio as an Addin, which was not then, and is still not now, something that is actually supported by Microsoft.

    If you do any kind of searching online for SSMS Addin today, you will find all kinds of references for how to build one in SQL Server 2005, and SQL Server 2008, but one person that you are most certainly going to find is Mladen Prajdic (Blog|Twitter), the author of the SSMS Tools Pack.  Right around the point that I was ready give up on ever figuring out how to build a Addin for SSMS, I got a comment from Mladen on twitter about the Extended Events Manager, that opened up conversations between us about the problems I was facing in my attempts to move it from being a standalone application into a SSMS Addin.  Mladen answered a number of questions for me, and even looked at a couple of code files that I was having problems with through email exchanges, and the result was the first release of the SSMS Addin for Extended Events in July 2009.  There have been a few revisions to the Addin since its initial release, mainly performance enhancements to the Target data processing methodologies, and some fixes to compatibility issues with SQL Server 2008 R2 due to changes in the SSMS codebase.

    I owe a lot of people for their input, feedback, ideas, and assistance with this project.  I may have been the person slinging the code, but the project is as much the combined work of all the people mentioned above, without which I probably never would have continued as far as I did with the project.

    Future of the Project

    If you look at the project page on Codeplex, one of the first things you will probably notice is that I haven’t done any real work on it in over a year, and that is essentially the case.  The last build of the project, was completely bug free and perfect, was release right before I changed jobs, and then went back to school, so I haven’t had a lot of time to dedicate to the project, and I haven’t really had a lot of motivation to since there hasn’t been a lot of feedback on what bugs or issues people have encountered when using the Addin.  I know that there are some bugs in the current release, every now and then I encounter one and ask myself when am I going to finally fix that.

    Currently I have been reworking the Library code base to work with SQL Server Denali CTP1 and provide backwards compatibility with SQL Server 2008 and 2008R2.  This has proven to be somewhat of a challenge for a number of reasons, mainly I haven’t written much .NET code over the last year and I have had to put some effort into learning what some blocks of code actually do again.  In addition to that I have had to make some significant changes to support changes that occurred in Denali, and doing that while ensuring backwards compatibility is maintained is slow, but from the XEvents.Library DLL standpoint, I have made all the necessary changes to support all three releases of SQL Server. Lets just hope that Microsoft doesn’t change to much in the remaining CTP’s, RC’s and ultimately the RTM of Denali! (Mike I know you are reading this, hint hint!) 

    Since the Visual Studio environment that Management Studio runs on top of was changed out, that changes a number of things associated with the Addin portion of the Extended Events Manager, and I have yet to figure out how to actually register an Addin in Denali CTP1 yet, but I do know that its possible and already been done so I am sure that I will get there at some point.  For now I am reworking the UI components using the stand alone application approach that I have used in the past for building and testing the controls that are actually registered in the Addin in SSMS.  Hopefully sometime in January 2011 I will be able to release at least the stand alone application in a Denali build and then be in a position to focus on figuring out how to hack an Addin back into SSMS.

  • An XEvent a Day (11 of 31) – Targets Week – Using Multiple Targets to Debug Orphaned Transactions

    Yesterday’s blog post Targets Week – etw_classic_sync_target covered the ETW integration that is built into Extended Events and how the etw_classic_sync_target can be used in conjunction with other ETW traces to provide troubleshooting at a level previously not possible with SQL Server.  In today’s post we’ll look at how to use multiple targets to simplify analysis of Event collection.

    Why Multiple Targets?

    You might ask why you would want to use multiple Targets in an Event Session with Extended Events?  The best answer is because each of the Targets handles the Event data in different ways, and by combining their uses, you can easily simplify the task of tracking down problems.  Two days ago I talked about the pair_matching Target and how it only retains Events that have not been matched.  Prior to that I talked about the ring_buffer and asynchronous_file_target Targets which retains the Raw Data for the Events.  The difference in their operations makes using multiple Targets a great benefit provided by Extended Events over SQL Trace.

    Example Problem to Troubleshoot

    Two years ago out of nowhere, one of the production databases I supported started having transaction log growth problems.  After growing the log for the second time in a day, I started looking at the database because something was obviously not right. 

    The first place I looked was the sys.databases DMV and specifically the log_reuse_wait_desc column, which provides the reason that the transaction log for a database is not truncating.  When I queried this DMV, I found that the log_reuse_wait_desc was ACTIVE_TRANSACTION, meaning that the database had an open transaction.  So I ran DBCC OPENTRAN to get some more information about the open transaction (you can query the DMV’s for this information as well) and found that the transaction had been open for over 8 hours.  I queried sys.dm_exec_sessions for the session_id and found that the session was still being used and had submitted a request within the last minute, so it seemed to be an orphaned transaction.

    SELECT 
        session_id,
        login_time, 
        last_request_start_time, 
        last_request_end_time
    FROM sys.dm_exec_sessions
    WHERE session_id = 76

    After discussions with the Analyst for the application and the applications vendor, it was decided that the session should be killed forcing a ROLLBACK of the transaction and allowing the log to be truncate.  (I’ll discuss why this might prove be problematic later in this post)  This resolved the problem, at least until the next day when the database began running out of space in the transaction log again, and once again had an open transaction that had been open for hours on a session that was sill being used by the application.  What was really interesting was there was no correlation between the previous days open transactions begin time, and the begin time of the second occurrence of the problem, so it seemed to be a completely random occurrence which was not going to be easy to troubleshoot. 

    I created a server side trace and tried over the next two days to figure out what the issue actually was, but didn’t make much head way until I expanded the trace to have the statement starting and completed events along with the Errors and Warnings Events in the trace.  When the problem reoccurred, I was able to read through the trace files using filtering to minimize the Trace Events captured down to the specific spid that held the open transaction and the events that occurred five minutes before and after the transaction_begin_time for the open transaction.  While looking at the event information I found an Attention Event and was able to deduce what had happened.

    The application was a ASP.NET application, and the vendor used the CommandTimeout default which is 30 seconds.  What happened was that a process was invoked that called a stored procedure to archive information from a transactional table into an archive table inside of the database, and the number of rows being archived caused the stored procedures execution to exceeded 30 seconds resulting in a timeout in the ASP.NET application, and the application silently handled the exception by doing nothing.  The problem was that the stored procedure issued a BEGIN TRANSACTION before archiving the rows, and when the timeout occurred and the Attention Event was raised, the command aborted leaving the open transaction and creating the problem with the log not truncating. 

    The connection was returned to the ASP.NET Connection Pool, and was constantly being reused by the application to do who knows what other operations, which is where killing the connection was potentially a very bad thing to do.  All of the activity performed by this session was performed under the open transaction, so by killing the session, all of the activity would be rolled back.  With no way to tell what exactly would be rolled back, killing the session should not be taken lightly.

    For the remainder of this post I am going to show a repro of this particular problem and how to use Multiple Targets in Extended Events to simplify the troubleshooting this.

    Setting Up the Demo

    To setup a reproduction of this problem you will need two instances of SSMS open.  One of them will be used to connect to the SQL Server normally, and the other will be used to act like the ASP.NET application that originally had the problem.  To setup the second instance of SSMS to act like the ASP.NET application, we are going to set the Execution Timeout using the Options of the Connect to Database Engine window, and we are also going to add an Additional Connection Parameter to the connection to set the Application Name on the connection to “Some Poorly Written App” as shown in the below screenshots from SSMS.

    image   image

     

     

    For the remainder of this blog post I am going to refer to the two different instances of SSMS as Normal and PoorApp in the hopes that this prevents confusion. 

    In the Normal SSMS we will create a database and some objects to support the repro of the problem:

    CREATE DATABASE [MultiTargetDemo]
    GO
    USE [MultiTargetDemo]
    GO
    CREATE TABLE dbo.RandomObjectsArchive
    (ArchiveID int identity primary key,
     TableName nvarchar(128),
     IndexName nvarchar(128),
     ColumnName nvarchar(128))
    GO
    CREATE TABLE dbo.TrackArchiveRunTimes
    (RuntimeID int identity primary key,
     ArchiveRuntime datetime DEFAULT(CURRENT_TIMESTAMP))
    GO
    CREATE PROCEDURE dbo.GenerateRandomObjects
    AS
    BEGIN TRANSACTION
        INSERT INTO dbo.TrackArchiveRunTimes
        DEFAULT VALUES;
    
        INSERT INTO RandomObjectsArchive
            (TableName, IndexName, ColumnName)
        SELECT TOP 10000 a.name, i.name, c.name
        FROM sys.objects AS a
            CROSS JOIN sys.indexes AS i
            CROSS JOIN sys.columns AS c
            CROSS JOIN master.dbo.spt_values AS sv
        WHERE sv.type = 'P' 
          AND sv.number < 6 --Adjust to increase runtime
        ORDER BY NEWID() DESC
    COMMIT TRANSACTION
    GO
    USE [master]
    GO

    The GenerateRandomObjects stored procedure Inserts a row into a tracking table that tracks when the stored procedure was executed, and then simulates a long running archive process by doing something that you should never do in production code.  The sv.number predicate in the query can be increased or decreased based on the performance of the system being tested against to ensure that the stored procedure runs longer than the Execution Timeout setting, which on my PoorApp SSMS instance was set to 10 seconds.  Increasing the value by 1 has an exponential impact on the performance degradation of the stored procedure, so any changes should be made incrementally to ensure that you don’t create a tempdb bloat problem with the Cartesian product of the query being executed.

    Setting Up the Event Session

    To troubleshoot this problem using Extended Events we will create an Event Session that captures the following Events:

    sqlserver.database_transaction_begin
    sqlserver.database_transaction_end
    sqlserver.sql_statement_starting
    sqlserver.sql_statement_completed
    sqlserver.sp_statement_starting
    sqlserver.sp_statement_completed
    sqlserver.rpc_starting
    sqlserver.rpc_completed
    sqlserver.module_start
    sqlserver.module_end
    sqlserver.error_reported

    We’ll add the following Actions to each of the Events:

    sqlserver.session_id
    sqlserver.database_id
    sqlserver.tsql_stack

    and add the sqlserver.sql_text Action to the starting Events so that we can track what is actually being executed.  Every Event in the Event Session will have a Predicate on the sqlserver.client_app_name so that the Event only fires for connections and requests from “Some Poorly Written App”. 

    IF EXISTS(SELECT * 
             FROM sys.server_event_sessions 
             WHERE name='OrphanedTransactionHunter') 
        DROP EVENT SESSION [OrphanedTransactionHunter] ON SERVER; 
    CREATE EVENT SESSION OrphanedTransactionHunter
    ON SERVER
    ADD EVENT sqlserver.database_transaction_begin
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.database_transaction_end
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.sql_statement_starting
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.sql_statement_completed
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.sp_statement_starting
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.sp_statement_completed
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.rpc_starting
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.rpc_completed
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.module_start
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack, sqlserver.sql_text)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.module_end
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App')),
    ADD EVENT sqlserver.error_reported
    (    ACTION(sqlserver.session_id, sqlserver.database_id, sqlserver.tsql_stack)
        WHERE (sqlserver.client_app_name = 'Some Poorly Written App'))
    ADD TARGET package0.ring_buffer,
    ADD TARGET package0.pair_matching
    ( SET begin_event = 'sqlserver.database_transaction_begin',
          begin_matching_actions = 'sqlserver.session_id',
          end_event = 'sqlserver.database_transaction_end',
          end_matching_actions = 'sqlserver.session_id',
          respond_to_memory_pressure = 1
    )
    WITH (MAX_DISPATCH_LATENCY=5 SECONDS, TRACK_CAUSALITY=ON)
    GO
    
    ALTER EVENT SESSION OrphanedTransactionHunter
    ON SERVER
    STATE=START
    GO

    Fired Events will be dispatched to two different Targets, the package0.ring_buffer to capture the Raw Data (in a true production environment, the package0.asynchronous_file_target would generally be a better Target for Raw Data capture of any volume), and the package0.pair_matching Target which has been configured to match on the sqlserver.database_transaction_begin/end Events based on the sqlserver.session_id Action.  To ensure that we can track the relationship between events, the Event Session will have TRACK_CAUSALITY set to ON, and to minimize the time it takes for Events to be dispatched for our test, the MAX_DISPATCH_LATENCY will be set to 5 seconds.

    Putting It All Together

    With the Event Session running, we can change over to our PoorApp SSMS instance and execute the GenerateRandomObjects stored procedure inside of the MultiTargetDemo database.

    EXECUTE MultiTargetDemo.dbo.GenerateRandomObjects

    When this executes, the command will timeout and leave the transaction open, simulating the original problem exactly.  Once the query times out, switch back to the Normal SSMS Instance and in a new window execute the stored procedure again and allow it complete its execution.  Since the default timeout of 0 is used in the Normal SSMS Instance, the execution will not time out.  Then we can look at the sys.databases DMV and see that the log_reuse_wait_desc is ACTIVE_TRANSACTION.

    SELECT log_reuse_wait_desc 
    FROM sys.databases
    WHERE database_id = DB_ID('MultiTargetDemo')

    image

    If we look at DBCC OPENTRAN for the MultiTargetDemo database we will see our orphaned transaction:

    DBCC OPENTRAN([MultiTargetDemo])

    image

    As I mentioned earlier in this post, the transaction can also be seen in the DMV’s:

    SELECT 
        dtst.session_id, 
        dtdt.database_id,
        dtst.transaction_id, 
        dtat.name,
        dtdt.database_transaction_begin_time
    FROM sys.dm_tran_session_transactions AS dtst
    JOIN sys.dm_tran_active_transactions AS dtat
        ON dtst.transaction_id = dtat.transaction_id
    JOIN sys.dm_tran_database_transactions AS dtdt
        ON dtdt.transaction_id = dtst.transaction_id
    WHERE database_id = DB_ID('MultiTargetDemo')

     

    image

    Now that we have our problem reproduced, lets look at how we can use the information captured by our Extended Event Session to track it back to the source of the problem.  First we’ll query the pair_matching Target to find out information about the sqlserver.database_transaction_begin Event that was unmatched.

    -- Query the XML to get the Target Data
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
        n.value('(event/action[@name="session_id"]/value)[1]', 'int') as session_id,
        n.value('(event/action[@name="database_id"]/value)[1]', 'int') as [database_id],
        n.value('(event/action[@name="tsql_stack"]/value)[1]', 'nvarchar(max)') as tsql_stack,    
        n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)') as attach_activity_id
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s 
            JOIN sys.dm_xe_session_targets AS t 
                ON t.event_session_address = s.address
            WHERE s.name = 'OrphanedTransactionHunter'
              AND t.target_name = 'pair_matching'
        ) AS sub
        CROSS APPLY target_data.nodes('PairingTarget/event') AS q(td)
    ) as tab
    -- We are interested in unmatched sqlserver.database_transaction_begin Events
    WHERE n.value('(event/@name)[1]', 'varchar(50)') = 'database_transaction_begin'
    ORDER BY session_id, activity_id
    GO
    

     

    image

    From this we can see our orphaned transaction event, and find the attach_activity_id of that Event.  The attach_activity_id Action is added to the Events in an Event Session when TRACK_CAUSALITY is turned ON.  There are two pieces of information contained in the attach_activity_id Action, the activity Guid (the first 36 characters of the value) and the sequence number for the Event, the number following the Guid.  The Guid can be used to find related Events, and the sequence number can be used to determine the order that the Events occurred.  By using the Guid from the attach_activity_id Action from our first query, we can query the ring_buffer Target and parse out the specific Events we are interested in looking at further.

    -- Query the XML to get the Target Data
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        n.value('(event/@package)[1]', 'varchar(50)') AS package_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],  
        n.value('(event/action[@name="database_id"]/value)[1]', 'int') as [database_id],
        n.value('(event/action[@name="session_id"]/value)[1]', 'int') as [session_id],
        n.value('(event/data[@name="object_id"]/value)[1]', 'int') AS [object_id],
        n.value('(event/data[@name="object_type"]/value)[1]', 'nvarchar(128)') AS [object_type],
        n.value('(event/data[@name="object_name"]/value)[1]', 'nvarchar(128)') AS [object_name],
        n.value('(event/data[@name="error"]/value)[1]', 'int') AS [error],
        n.value('(event/data[@name="severity"]/value)[1]', 'int') AS [severity],
        n.value('(event/data[@name="state"]/value)[1]', 'int') AS [state],
        n.value('(event/data[@name="user_defined"]/value)[1]', 'bit') AS [user_defined],
        n.value('(event/data[@name="message"]/value)[1]', 'nvarchar(4000)') AS [message],
        n.value('(event/data[@name="duration"]/value)[1]', 'int') AS [duration],
        n.value('(event/data[@name="row_count"]/value)[1]', 'int') AS [row_count],
        n.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
        n.value('(event/data[@name="reads"]/value)[1]', 'int') AS [reads],
        n.value('(event/data[@name="writes"]/value)[1]', 'int') AS [writes],
        n.value('(event/action[@name="tsql_stack"]/value)[1]', 'nvarchar(max)') AS [tsql_stack],
        n.value('(event/data[@name="offset"]/value)[1]', 'int') AS [offset],
        n.value('(event/data[@name="offset_end"]/value)[1]', 'int') AS [offset_end],
        n.value('(event/data[@name="nest_level"]/value)[1]', 'int') AS [nest_level],           
         n.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS [sql_text],
        CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) AS activity_id,
        CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) AS event_sequence
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s 
            JOIN sys.dm_xe_session_targets AS t 
                ON t.event_session_address = s.address
            WHERE s.name = 'OrphanedTransactionHunter'
              AND t.target_name = 'ring_buffer'
        ) AS sub
        CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
    ) as tab
    -- We are interested in unmatched sqlserver.database_transaction_begin Events
    WHERE SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) = 'ADCF379A-4BCA-41BA-9B08-4C2265894392'
    ORDER BY session_id, event_sequence
    GO
    

    With a little bit more work, we can reduce the XML parsing to only the important data elements that we need, and we can parse the tsql_stack Action to retrieve the related Event level statement_text from the sys.dm_exec_sql_text() DMF, since the sql_text Action did not have the intended information.

    -- Query the XML to get the Target Data
    SELECT 
        event_name,
        timestamp,
        database_id, 
        OBJECT_NAME(st.objectid, st.dbid) AS ObjectName,
        SUBSTRING(st.text, (tsql_stack.value('(/frame/@offsetStart)[1]', 'int')/2)+1, 
            ((CASE tsql_stack.value('(/frame/@offsetEnd)[1]', 'int')
                WHEN -1 THEN DATALENGTH(st.text)
                ELSE tsql_stack.value('(/frame/@offsetEnd)[1]', 'int')
                END - tsql_stack.value('(/frame/@offsetStart)[1]', 'int'))/2) + 1) AS statement_text,
        duration,
        activity_id,
        event_sequence
    FROM
    (
    SELECT 
        n.value('(event/@name)[1]', 'varchar(50)') AS event_name,
        DATEADD(hh, 
                DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), 
                n.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],  
        n.value('(event/action[@name="database_id"]/value)[1]', 'int') as [database_id],
        n.value('(event/data[@name="duration"]/value)[1]', 'int') AS [duration],
        CAST(n.value('(event/action[@name="tsql_stack"]/value)[1]', 'nvarchar(max)') AS XML) AS [tsql_stack],
        CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) AS activity_id,
        CAST(SUBSTRING(n.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) AS event_sequence
    FROM
    (    SELECT td.query('.') as n
        FROM 
        (
            SELECT CAST(target_data AS XML) as target_data
            FROM sys.dm_xe_sessions AS s 
            JOIN sys.dm_xe_session_targets AS t 
                ON t.event_session_address = s.address
            WHERE s.name = 'OrphanedTransactionHunter'
              AND t.target_name = 'ring_buffer'
        ) AS sub
        CROSS APPLY target_data.nodes('RingBufferTarget/event') AS q(td)
    ) as tab
    ) as tab2
    CROSS APPLY sys.dm_exec_sql_text(tsql_stack.value('xs:hexBinary(substring((/frame/@handle)[1], 3))', 'varbinary(max)')) AS st
    -- We are interested in Events in activity_id sequence of the orphaned transaction only
    WHERE activity_id = 'ADCF379A-4BCA-41BA-9B08-4C2265894392'
    ORDER BY session_id, event_sequence
    GO
    

    image

    The highlighted value shows that the stored procedure execution ended at the Execution Timeout limit that was set for the PoorApp SSMS Instance.  Beyond that we can track each of the statements and see that when the execution ended, it was in the INSERT INTO RandomObjectsArchive statement in the stored procedure allowing us to target our efforts at troubleshooting to the specific problem in a short amount of time.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement