THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Aaron Bertrand

Playing with CDC in Katmai

Change Data Capture (CDC) is something DBAs around me have been screaming about for ages.  For auditing and other reasons, we have needed to rely on 3rd party tools from companies like Lumigent, whose motto for their auditing product previously named Entegra went something like, "Who did what to which data when?"  It's a slogan I don't think I'll ever forget, and not just because I have it plastered on a T-Shirt I grabbed at PASS or SQL Connections that year...

Much to the delight of SQL Server DBAs everywhere (and surely to the chagrine of 3rd party vendors like Lumigent), there is now CDC functionality coming in Katmai (sorry, SQL Server 2008), and you can play with it in the June CTP.  (If you haven't read in one of the hundreds of blog posts on this site and elsewhere, the June CTP is available here.)

I started playing with this feature, and it is very easy to set up.

CREATE DATABASE CDCTest;
GO
USE CDCTest;
GO
EXEC sp_cdc_enable_db_change_data_capture;
GO
CREATE TABLE dbo.Test
(
id INT PRIMARY KEY,
tx NVARCHAR(MAX),
dt DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

EXEC sp_cdc_enable_table_change_data_capture 'dbo', 'Test', @role_name = NULL;
-- there are many more options to this procedure; this is the default
-- behavior, including capturing all columns in the table.
GO

Now, in two out of three tries from scratch, I've found that I needed to restart SQL Server Agent for data changes to start being captured (I filed a suggestion on Connect).  So, to be sure, before we put any data into the table, let's go to SQL Server Agent in Object Explorer, expand jobs, right-click "cdc.CDCTest_capture" and choose "Start Job at Step..." and get the job running once.  Now:

INSERT dbo.Test(id,tx) SELECT 1, 'foo';
INSERT dbo.Test(id,tx) SELECT 2, 'bar';
WAITFOR DELAY '00:00:05';
UPDATE dbo.Test SET tx = 'splunge' WHERE id = 2;

In order to capture the data changes, there is a new table created in a schema called 'cdc', named <schema>_<table>_CT.  In this case, it contains all of the columns from the table, as well as other information including pointers to the log and the type of operation that was captured.

SELECT * FROM cdc.dbo_test_CT;

You should see something like this:

__$start_lsn __$end_lsn __$seqval __$operation __$update_mask id tx dt
0x0000001A000001360013 NULL 0x0000001A000001360012 2 0x07 1 foo 2007-06-21 11:29:43.670
0x0000001A0000013A0003 NULL 0x0000001A0000013A0002 2 0x07 2 bar 2007-06-21 11:29:43.670
0x0000001A0000013B0004 NULL 0x0000001A0000013B0002 3 0x02 2 bar 2007-06-21 11:29:48.670
0x0000001A0000013B0004 NULL 0x0000001A0000013B0002 4 0x02 2 splunge 2007-06-21 11:29:48.670

For the __$operation column, 1 is a delete, 2 is an insert, 3 is the before image of an update, and 4 is the after image of an update.  (SELECTs are not captured.)  So, we can see my initial two inserts there in the first two rows, and then the last two rows show the before and after image of the update (this is just like the deleted and inserted columns available within a DML trigger). 

[An important item to note here is that if you make schema changes to the table, you need to disable CDC at the table level and then re-enable it in order to capture changes to the new columns.  This "feature" is documented but I don't think it will be very intuitive.]

This gives you a bit of information, but for me, it wasn't quite enough.  What if I want to track (easily!) what user performed that update, changing 'bar' to 'splunge', and when it happened?  Well, I created my own logging table, with a representation of the three columns from the unique index on dbo_test_CT, and two additional columns (username and eventDate):

CREATE TABLE cdc.dbo_test_CT_MoreInfo
(
startlsn BINARY(10),
seqval BINARY(10),
operation INT,
username SYSNAME NOT NULL DEFAULT SUSER_SNAME(),
eventDate DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (startlsn, seqval, operation)
);
GO

To capture my own data (essentially, lumping CDC on top of CDC) I created the following trigger on the capture table:

CREATE TRIGGER cdc.LogMoreCDCInfo
ON cdc.dbo_test_CT
FOR INSERT
AS
BEGIN
 
IF @@ROWCOUNT > 0
 
BEGIN
   
INSERT cdc.dbo_test_CT_MoreInfo(startlsn,seqval,operation)
      SELECT __$start_lsn, __$seqval, __$operation FROM inserted;
  END
END
GO

And I can retrieve the information using a query like this:

SELECT 
 
operation = CASE m.operation 
    WHEN 1 THEN 'delete'
    WHEN 2 THEN 'insert'
    WHEN 3 THEN 'update (before)'
    WHEN 4 THEN 'update (after)'
    ELSE 'unknown' END,
  c.id,  
  
c.tx,
  c.dt,
  m.username,
  m.eventDate
FROM cdc.dbo_test_CT_MoreInfo m
INNER JOIN cdc.dbo_test_CT c
ON c.__$start_lsn = m.startlsn
AND c.__$seqval = m.seqval
AND c.__$operation = m.operation
ORDER BY m.eventDate;

[Note that change data is only kept around for 72 hours by default.  This retention time is stored in minutes (4320) in the msdb.dbo.cdc_jobs.retention column.  I've had reasonable (but not perfect) success in changing this value and having the cleanup job obey.]

Now, you can make the trigger get more information, e.g. host_name from sys.dm_exec_sessions or client_net_address from sys.dm_exec_connections.  I just wanted to open the door.

Note that I did not stress test this solution... not that a stress test on an early CTP would be very realistic, anyway.  I'm just playing around here, so far at least.  :-)


Published Thursday, June 21, 2007 12:23 PM by AaronBertrand

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Yan Liberman said:

Change data capture (CDC) - новая функциональность в SQL Server 2008, позволяюща

July 9, 2007 4:40 AM
 

Krishna said:

This is not working for me as SUSER_SNAME() inside the cdc trigger is sa not the user making the data change. Is there any config setting I am missing?

April 13, 2009 5:53 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement