THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Jamie Thomson

This is the blog of Jamie Thomson, a freelance data mangler in London

Using Composable DML to maintain entity history [T-SQL]

I recently inherited a data warehousing SQL Server solution from my good buddy James Rowland-Jones (you may know him from his more auspicious role on the SQLPass board) and it contained a very cool technique of using composable DML (a feature that arrived in SQL Server 2008) to maintain history of updates to a particular table. I knew about composable DML (and have blogged about it previously) however had never considered how useful it could be in a data warehousing scenario, hopefully James won’t mind if I share what I’ve learned in this blog post.

In our case we have a Customer dimension that needs to be maintained as a type 2 however for speed of querying purposes we also need to have a table that provides the most recent state view of all our customers. For demo purposes, here are our two tables:

CREATE TABLE [#Customer] (
	[CustomerSurrogateKey]	INT		IDENTITY(1,1)	PRIMARY KEY
,	[CustomerNaturalKey]	NVARCHAR(100)
,	[NumberOfChildren]	TINYINT
,	[HomeTown]		NVARCHAR(MAX)
,	[EffectiveDateTime]	DATETIME2(7)
,	CONSTRAINT [Customer_UK] UNIQUE ([CustomerNaturalKey],[EffectiveDateTime])
);
CREATE TABLE [#CustomerCurrent] (
	[CustomerCurrentSurrogateKey]	INT		IDENTITY(1,1)	PRIMARY KEY
,	[CustomerNaturalKey]		NVARCHAR(100)			UNIQUE
,	[NumberOfChildren]		TINYINT
,	[HomeTown]			NVARCHAR(100)
);

[#Customer] is our type 2 dimension table and [#CustomerCurrent] will store the most recent state of each customer.

Composable DML allows us to take data from the OUTPUT clause of an INSERT or MERGE statement and insert that data elsewhere; in our case we are going to first MERGE into [#CustomerCurrent] and then insert the new or updated records into [#Customer]. Like so:

DECLARE	@newData TABLE (
	[CustomerNaturalKey]	NVARCHAR(100)
,	[NumberOfChildren]	TINYINT
,	[HomeTown]		NVARCHAR(100)
);
/*Manufacture some incoming new data*/
INSERT	@newData([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
VALUES  (N'Jamie',1,N'London')
,	(N'Sarah',0,N'Birmingham')
,	(N'Ginny',0,N'Glasgow');
/*MERGE into #CustomerCurrent and insert new/updated records into #Customer using Composable DML*/
INSERT	#Customer([CustomerNaturalKey],[NumberOfChildren],[HomeTown],[EffectiveDateTime])
SELECT	ISNULL([mergeOutput].[InsertedCustomerNaturalKey],[mergeOutput].[DeletedCustomerNaturalKey])
,	ISNULL([mergeOutput].[InsertedNumberOfChildren],[mergeOutput].[DeletedNumberOfChildren])
,	ISNULL([mergeOutput].[InsertedHomeTown],[mergeOutput].[DeletedHomeTown])
,	SYSDATETIME()
FROM	(
		MERGE	#CustomerCurrent tgt
		USING	@newData src
			ON	tgt.[CustomerNaturalKey]	=	src.[CustomerNaturalKey]
		WHEN NOT MATCHED THEN
			INSERT([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
			VALUES(src.[CustomerNaturalKey],src.[NumberOfChildren],src.[HomeTown])
		WHEN MATCHED AND
				(	tgt.[NumberOfChildren]	<>	src.[NumberOfChildren]
				OR	tgt.[HomeTown]		<>	src.[HomeTown]
				)
				THEN
			UPDATE	
			SET		[NumberOfChildren]	=	src.[NumberOfChildren]
			,		[HomeTown]	=	src.[HomeTown]
		OUTPUT	$ACTION					AS [Action]
		,		DELETED.[CustomerNaturalKey]	AS [DeletedCustomerNaturalKey]
		,		DELETED.[NumberOfChildren]	AS [DeletedNumberOfChildren]
		,		DELETED.[HomeTown]		AS [DeletedHomeTown]
		,		INSERTED.[CustomerNaturalKey]	AS [InsertedCustomerNaturalKey]
		,		INSERTED.[NumberOfChildren]	AS [InsertedNumberOfChildren]
		,		INSERTED.[HomeTown]		AS [InsertedHomeTown]
		)[mergeOutput]
WHERE	[mergeOutput].[Action] IN ('UPDATE','INSERT')
;

Take a minute or two to look at that code and work out what is going on:

  1. We take the new data in @newData and compare it to the target, [#CustomerCurrent], to see if there are any new or updated records. If there are, do the necessary INSERT or UPDATE within the MERGE.
  2. Output the newly inserted/updated data using the OUTPUT clause
  3. Insert the new/updated records into our type 2 dimension table, [#Customer].

After we run this we end up with this data in the two tables:

image

Obviously there is no difference at this stage. The real benefit comes when we update and insert new records; let’s model that scenario by changing what’s in @newData and running the same code:

INSERT	@newData([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
VALUES  (N'Ginny',0,N'Edinburgh')	--Ginny has moved from Glasgow to Edinburgh
,	(N'Mike',2,N'Leeds')		--Mike is a new customer
;
--Run the MERGE with Composable DML again....

Which leaves us with:

image

Note how we have two records for Ginny in [#Customer] however only one record, showing the current state ([HomeTown]=’Edinburgh’), in [#CustomerCurrent]. The real beauty of this technique is that both operations (i.e. the MERGE into [#CustomerCurrent] and the INSERT into [#Customer]) are done under the same transaction so if either fail, both fail.

Pretty cool, no? Once all the data is inserted then we can take a look at the history of each of our customers

/*This code only works on SQL Server 2012 and beyond due to the use of the LEAD function, however you can achieve similar 
in earlier versions using a technique outlined at http://sqlblog.com/blogs/jamie_thomson/archive/2009/11/28/debunking-kimball-effective-dates.aspx
*/
SELECT
c.[CustomerSurrogateKey] , c.[CustomerNaturalKey] , c.[NumberOfChildren] , c.[HomeTown] , [StartEffectiveDateTime]=c.[EffectiveDateTime] ,       ISNULL( LEAD(c.[EffectiveDateTime]) OVER(PARTITION BY c.[CustomerNaturalKey] ORDER BY c.[EffectiveDateTime] ASC),                  CAST('20501231' AS DATETIME)              ) AS [EndEffectiveDateTime] FROM    [#Customer] c

image

That’s it! Hope this proves useful to some of you. If you want to read more about Composable DML then take a read of Exploring Composable DML. Thank you to James for showing me a great use of this little-known feature in SQL Server.

@Jamiet


Rather than running each of the above snippets individually here is the entire demo script. Simply copy-and-paste everything below into a SSMS query window and hit execute:
CREATE TABLE [#Customer] (
	[CustomerSurrogateKey]	INT		IDENTITY(1,1)	PRIMARY KEY
,	[CustomerNaturalKey]	NVARCHAR(100)
,	[NumberOfChildren]	TINYINT
,	[HomeTown]		NVARCHAR(MAX)
,	[EffectiveDateTime]	DATETIME2(7)
,	CONSTRAINT [Customer_UK] UNIQUE ([CustomerNaturalKey],[EffectiveDateTime])
);
CREATE TABLE [#CustomerCurrent] (
	[CustomerCurrentSurrogateKey]	INT		IDENTITY(1,1)	PRIMARY KEY
,	[CustomerNaturalKey]		NVARCHAR(100)			UNIQUE
,	[NumberOfChildren]		TINYINT
,	[HomeTown]			NVARCHAR(100)
);
DECLARE	@newData TABLE (
	[CustomerNaturalKey]	NVARCHAR(100)
,	[NumberOfChildren]	TINYINT
,	[HomeTown]		NVARCHAR(100)
);
/*Manufacture some incoming new data*/
INSERT	@newData([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
VALUES  (N'Jamie',1,N'London')
,	(N'Sarah',0,N'Birmingham')
,	(N'Ginny',0,N'Glasgow')		
;
/*MERGE into #CustomerCurrent and insert new/updated records into #Customer using Composable DML*/
INSERT	#Customer([CustomerNaturalKey],[NumberOfChildren],[HomeTown],[EffectiveDateTime])
SELECT	ISNULL([mergeOutput].[InsertedCustomerNaturalKey],[mergeOutput].[DeletedCustomerNaturalKey])
,	ISNULL([mergeOutput].[InsertedNumberOfChildren],[mergeOutput].[DeletedNumberOfChildren])
,	ISNULL([mergeOutput].[InsertedHomeTown],[mergeOutput].[DeletedHomeTown])
,	SYSDATETIME()
FROM	(
		MERGE	#CustomerCurrent tgt
		USING	@newData src
			ON	tgt.[CustomerNaturalKey]	=	src.[CustomerNaturalKey]
		WHEN NOT MATCHED THEN
			INSERT([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
			VALUES(src.[CustomerNaturalKey],src.[NumberOfChildren],src.[HomeTown])
		WHEN MATCHED AND
				(	tgt.[NumberOfChildren]	<>	src.[NumberOfChildren]
				OR	tgt.[HomeTown]		<>	src.[HomeTown]
				)
				THEN
			UPDATE	
			SET		[NumberOfChildren]	=	src.[NumberOfChildren]
			,		[HomeTown]	=	src.[HomeTown]
		OUTPUT	$ACTION					AS [Action]
		,		DELETED.[CustomerNaturalKey]	AS [DeletedCustomerNaturalKey]
		,		DELETED.[NumberOfChildren]	AS [DeletedNumberOfChildren]
		,		DELETED.[HomeTown]		AS [DeletedHomeTown]
		,		INSERTED.[CustomerNaturalKey]	AS [InsertedCustomerNaturalKey]
		,		INSERTED.[NumberOfChildren]	AS [InsertedNumberOfChildren]
		,		INSERTED.[HomeTown]		AS [InsertedHomeTown]
		)[mergeOutput]
WHERE	[mergeOutput].[Action] IN ('UPDATE','INSERT')
;
WAITFOR DELAY '00:00:01';
/*Manufacture some incoming new data*/
INSERT	@newData([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
VALUES  (N'Ginny',0,N'Edinburgh')	--Ginny has moved from Glasgow to Edinburgh
,	(N'Mike',2,N'Leeds')		--Mike is a new customer
;
 
/*Execute the same MERGE */
INSERT	#Customer([CustomerNaturalKey],[NumberOfChildren],[HomeTown],[EffectiveDateTime])
SELECT	ISNULL([mergeOutput].[InsertedCustomerNaturalKey],[mergeOutput].[DeletedCustomerNaturalKey])
,	ISNULL([mergeOutput].[InsertedNumberOfChildren],[mergeOutput].[DeletedNumberOfChildren])
,	ISNULL([mergeOutput].[InsertedHomeTown],[mergeOutput].[DeletedHomeTown])
,	SYSDATETIME()
FROM	(
		MERGE	#CustomerCurrent tgt
		USING	@newData src
			ON	tgt.[CustomerNaturalKey]	=	src.[CustomerNaturalKey]
		WHEN NOT MATCHED THEN
			INSERT([CustomerNaturalKey],[NumberOfChildren],[HomeTown])
			VALUES(src.[CustomerNaturalKey],src.[NumberOfChildren],src.[HomeTown])
		WHEN MATCHED AND
				(	tgt.[NumberOfChildren]	<>	src.[NumberOfChildren]
				OR	tgt.[HomeTown]		<>	src.[HomeTown]
				)
				THEN
			UPDATE	
			SET		[NumberOfChildren]	=	src.[NumberOfChildren]
			,		[HomeTown]	=	src.[HomeTown]
		OUTPUT	$ACTION					AS [Action]
		,		DELETED.[CustomerNaturalKey]	AS [DeletedCustomerNaturalKey]
		,		DELETED.[NumberOfChildren]	AS [DeletedNumberOfChildren]
		,		DELETED.[HomeTown]		AS [DeletedHomeTown]
		,		INSERTED.[CustomerNaturalKey]	AS [InsertedCustomerNaturalKey]
		,		INSERTED.[NumberOfChildren]	AS [InsertedNumberOfChildren]
		,		INSERTED.[HomeTown]		AS [InsertedHomeTown]
		)[mergeOutput]
WHERE	[mergeOutput].[Action] IN ('UPDATE','INSERT')
;
 
/*This code only works on SQL Server 2012 and beyond due to the use of the LEAD function, however you can achieve similar 
using in earlier versions using a technique outlined at http://sqlblog.com/blogs/jamie_thomson/archive/2009/11/28/debunking-kimball-effective-dates.aspx*/
SELECT	c.[CustomerSurrogateKey]
,		c.[CustomerNaturalKey]
,		c.[NumberOfChildren]
,		c.[HomeTown]
,		[StartEffectiveDateTime]=c.[EffectiveDateTime]
,       ISNULL( LEAD(c.[EffectiveDateTime]) OVER(PARTITION BY c.[CustomerNaturalKey] ORDER BY c.[EffectiveDateTime] ASC),                  CAST('20501231' AS DATETIME)              ) AS [EndEffectiveDateTime]
FROM    [#Customer] c 
Published Friday, July 13, 2012 3:09 PM by jamiet

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Rob Farley said:

Hi Jamie,

The way I do it is:

...

WHEN MATCHED AND t.Type2Cols != s.Type2Cols

UPDATE SET EndDate = SYSDATETIME()

WHEN NOT MATCHED BY SOURCE THEN

UPDATE SET EndDate = SYSDATETIME()

OUTPUT s.*

--... and then insert the new source rows in. So you just use the Merge to marked the existing rows as no longer current.

I still need to write a bunch of this stuff up in a lengthy blog post. It's on my list.

Rob

July 13, 2012 9:50 AM
 

anonymous said:

this could be done in Oracle much easier than this.... :-/

October 23, 2012 6:48 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement