THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Enjoy Another Sandwich -- Kent Tegels

Yummy slices of SQL Server between slices of .NET and XML

Shredding XML into tables

Hey folks! Yes, I am alive, but as you can imagine, I am spending my time chewing on SQL Server 2008 and have limited time to post much. But today, I received an email that I wanted to post on since this is a common question.

"How do I shred an XML document into a table."

The big issue here is that there really is not one best way to do it. If you have a relatively small document and you want a simple solution, what I'm going to show here is a reasonably well suited. However, if you have a large document, or a large number of small documents, this approach is relatively inefficient and you probably should look at using an SSIS, BizTalk or custom developed solution. If you have schematically complex XML, BizTalk or a custom develop solution is probably going to be needed at some point.

Then there is always the question "should you?" My answer is typically yes if your primary concern is query performance under load. The more transactional processing you need to do on the data encapsulated in the document, the more sense shredding makes. Conversely, if there is only limit query of the XML and reassembling the data into an XML format would be expensive, then you are probably better off leaving it "as is."

Following are two example of bringing a small amount of XML into a table using SQL Server 2005 and SQL Server 2008. In the case of SQL Server 2005, the example simply inserts data into the target table without any handling or attempts to update existing matching rows.

use scratch
go
if not OBJECT_ID('dbo.books') is null
 drop table dbo.books;
go
create table dbo.books(
 id tinyint not null primary key,
 sku nvarchar(15) not null,
 skuType nvarchar(10) not null,
 title nvarchar(100) not null,
 listPrice money not null);
go
set nocount on
declare @booklist xml;
select @booklist = bulkcolumn from openRowSet(bulk 'c:\booklist.xml',single_blob) as b;
select @booklist;
-- Yukon-style insert, you could do update similarly.
insert into dbo.books
select t.c.value('./id[1]','tinyint')
  , t.c.value('./sku[1]','nvarchar(15)')
  , t.c.value('./skuType[1]','nvarchar(10)')
  , t.c.value('./title[1]','nvarchar(100)')
  , t.c.value('./listPrice[1]','money')
from @booklist.nodes('//book') as t(c);
go
select * from dbo.books;
go 

Transact-SQL for SQL Server 2008 adds a very useful new statement called MERGE. This statement allows us synchronize two data sources. While that may not seem to have much use in a case like this, we can actually use it like an "UPSERT" where existing data is update and new data is inserted into a destination table.

declare @booklist xml;
select @booklist = bulkcolumn from openRowSet(bulk 'c:\booklist.xml',single_blob) as b;
with b2(id,sku,skuType,title,listPrice) as (
select t.c.value('./id[1]','tinyint')
  , t.c.value('./sku[1]','nvarchar(15)')
  , t.c.value('./skuType[1]','nvarchar(10)')
  , t.c.value('./title[1]','nvarchar(100)')
  , t.c.value('./listPrice[1]','money')
from @booklist.nodes('//book') as t(c))
merge into dbo.books as b1
using b2
on(b1.id = b2.id)
when matched then
 update set
  b1.sku = b2.sku
  , b1.skuType = b2.skuType
  , b1.title = b2.title
  , b1.listPrice = b2.listPrice
when target not matched then
 insert (id,sku,skuType,title,listPrice)
 values (b2.id,b2.sku ,b2.skuType,b2.title,b2.listPrice);
go
select * from dbo.books;
go

The syntax of the MERGE statement takes a bit of getting used to, but once you have the hang of it, it's very useful. 

Please feel free to send me a message using the link below if you are interested in downloading the example code and data. 

Published Wednesday, January 23, 2008 4:06 PM by ktegels

Comments

 

Stephen Morris said:

sort of off topic

but I couldn't help reacting to

if not OBJECT_ID('dbo.books') is null

instead of

if OBJECT_ID('dbo.books') is not null

which just seems a lot easier to read.

January 24, 2008 2:48 AM
 

Greg Linwood said:

Thanks Kent - I was actually looking for how I could shred xml, make some changes & re-serialise based on some set-based requirements, so this was a very timely post for me. Very helpful indeed! (c:

Cheers,

Greg Linwood

January 24, 2008 9:54 PM
 

ktegels said:

Stephen: To each their own, I guess.

Greg: You are welcome and I would certianly believe that it would much easier to this against the shredded data.

January 24, 2008 10:39 PM
 

robert towne said:

the issue for me is on the update.  say you have 25 potential columns in a table but you are only updating 1 or 2 (having varying # of columns to update makes it a bit harder since you can't hard code every possible variation of update columns easily).

If I only want to update 1 column how can I do that in 2005 (2008 would be cool to know for future but for now its production world).

it would seem something dynamic would have to be written to figure out which values have to be assigned to the columns specified, but I would think there has to be another (hopefully more elegant) way.  :)

good stuff..

January 25, 2008 12:35 PM
 

ktegels said:

Robert.

Something where you'd be passing in a list columns and values to be updated would take dynamic SQL, there's really not a good way around that.

January 25, 2008 12:44 PM
 

robert towne said:

yeah, this is what i came up with so far.  basically creating a dynamic string that i would later have to exec().  my only question was if this is heavily used would there be something more efficient - but I think i'll just have to do some load testing and try to optimize as i go.  cool deal - thx man.

create function fnShredXMLUpdate ( @update xml )

returns varchar ( max )

as

BEGIN

declare

@sql  varchar ( max )

,           @crlf char ( 2 )

,           @tab  char ( 1 )

,           @first      bit

select @crlf = char(13) + char ( 10 )

,           @tab = char ( 9 )

,           @first = 1

set @sql = 'update m ' + @crlf +  'set ' + @crlf + @tab

if @update.exist( '/message/sender' ) = 1

begin

 set @sql = @sql + 'sender = ''' + @update.value('/message[1]/sender[1]', 'varchar ( 256 )') + ''''

 set @first = 0

end

if @update.exist( '/message/recipient' ) = 1

begin

 if @first = 0 set @sql = @sql + ',' + @tab

 set @sql = @sql + @crlf+ @tab + 'recipient = ''' + @update.value('/message[1]/recipient[1]', 'varchar ( 256 )') + ''''

 set @first = 0

end

if @update.exist( '/message/importance' ) = 1

begin

 if @first = 0 set @sql = @sql + ',' + @tab

 set @sql = @sql + @crlf+ @tab +'importance = ''' + @update.value('/message[1]/importance[1]', 'varchar ( 3 )') + ''''

 set @first = 0

end

set @sql = @sql + @crlf + 'from Message m '

set @sql = @sql + @crlf + 'where message_id = '+ @update.value('/message[1]/message_id[1]', 'varchar ( 12 )')

return ( @sql )

END ;

select dbo.fnShredXMLUpdate (

'<message>

<sender>John Doe</sender>

<message_id>1234</message_id>

<importance>3</importance>

</message>' )

/* generates:

update m

set

sender = 'John Doe',

importance = '3'

from Message m

where message_id = 1234

*/

select dbo.fnShredXMLUpdate (

'<message>

<sender>John Doe</sender>

<message_id>1234</message_id>

<recipient>Jane Doe</recipient>

</message>' )

/* generates:

update m

set

sender = 'John Doe',

recipient = 'Jane Doe'

from Message m

where message_id = 1234

*/

January 25, 2008 1:00 PM
 

Michael Swart said:

Interesting, I recently wrote about the same subject.

I've got an example of how to use such a sproc from a .Net app here:

http://dbwhisperer.blogspot.com/2008/09/passing-xml-data-from-app-to-database.html.

And an example of what to do when the xml you use uses namespaces here: http://dbwhisperer.blogspot.com/2008/09/better-xml-shredding-example.html

September 30, 2008 11:31 AM
 

k;k; said:

klk;

December 9, 2008 4:22 AM
 

Divya said:

Hi Tegels,

  really useful blog....keep on posting :)

February 2, 2010 11:47 PM
 

DivyaRam said:

Tegels,

I ahve one doubt if we want to insert and update in more than  one tables can we use multiple merge satement in single stored procedure

Regards,

Divya

February 3, 2010 5:04 AM
New Comments to this post are disabled

About ktegels

Kent Tegels passed away on July 31, 2010. Kent was an Adjunct Professor at Colorado Technical University and a member of the technical staff at PluralSight. He was recognized by Microsoft with Most Valuable Professional (MVP) status in SQL Server for his community involvement with SQL Server and .NET. Kent held Microsoft Certifications in Database Administration and Systems Engineering, and contributed to several books on data access programming and .NET. He was a well known industry speaker, and resided in Sioux Falls, South Dakota.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement