THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Uniquely Identifying XML Nodes with DENSE_RANK

When working with XML in SQL Server, you might want to uniquely identify one node against another. But due to the flexibility with which XML can be defined, this is not always directly possible. SQL Server's own XML structures are guilty of having this problem, as I discovered while writing the Extended Events Code Generator. Events in the XML produced by XE have a limited number of attributes and depending on what you're collecting you may get numerous instances of events that look exactly the same. I wanted to provide a way to uniquely identify every collected event in order to allow for more advanced analysis of the results.

For the sake of this post I'll keep things a bit simpler than the Extended Events XML; we'll instead play with the following XML fragment:

<a>
  <b>
    <c>abc</c>
    <c>def</c>
  </b>
  <b>
    <c>abc</c>
    <c>def</c>
  </b>
</a>

The problem: We e want to pull all of the "c" values and, at the same time, tag whether they've come from the first or second "b" value. There are no attributes that we can use to uniquely identify the individual nodes, so we're rather stuck.

My first idea for solving this was to use ROW_NUMBER, but what to number? I came up with something similar to the following:

DECLARE @x XML
SET @x = '<a><b><c>abc</c><c>def</c></b><b><c>abc</c><c>def</c></b></a>'

SELECT
    ROW_NUMBER() OVER (ORDER BY b_xml) AS unique_b_node,
    c_node.value('(./text())[1]', 'varchar(50)') AS c_val
FROM
(
    SELECT
        b_node.query('.') AS b_xml
    FROM @x.nodes('/a/b') AS b (b_node)
) b_nodes
CROSS APPLY b_nodes.b_xml.nodes('/b/c') AS c (c_node)

Alas, this approach met with utter failure; as you will discover if you actually try to run this code, you cannot sort on instances of the XML data type, and therefore you cannot use ROW_NUMBER to uniquely identify them. I thought I was finished, so I asked around on Twitter. Peter Debetta replied with an interesting XQuery method for solving the problem:

DECLARE @x XML
SET @x = '<a><b><c>abc</c><c>def</c></b><b><c>abc</c><c>def</c></b></a>'

SELECT
    b_nodes.unique_b_node,
    c_node.value('(./text())[1]', 'varchar(50)') AS c_val
FROM
(
    SELECT
        b_node.query('.') AS b_xml,
        b_node.value('for $s in . return count(../*[. << $s]) + 1', 'int') AS unique_b_node
    FROM @x.nodes('/a/b') AS b (b_node)
) b_nodes
CROSS APPLY b_nodes.b_xml.nodes('/b/c') AS c (c_node)

This actually works pretty well when your query only returns four rows, or even forty. But it breaks down quickly after that, due to the exponential nature of how the XQuery counts the rows, and at only a few hundred rows I found myself waiting a number of minutes for my results.

Later, Mladen Prajdic decided to jump in and try to help, and he came up with the key to solving the problem: While you can't order on XML instances, you can order on nodes instances resulting from a call to the nodes() function, as shown in the code below. Using ROW_NUMBER in this case would not yield the expected results; since there are four "c" nodes in the output, we would end up with four row numbers. DENSE_RANK, on the other hand, counts ties equally, so all "b" nodes wind up with the same number. A seemingly simple solution, but not exactly intuitive, and it took three of us a surprising amount of work to come up with it.

 

DECLARE @x XML
SET @x = '<a><b><c>abc</c><c>def</c></b><b><c>abc</c><c>def</c></b></a>'

SELECT
    c_node.value('(./text())[1]', 'varchar(50)') AS c_val,
    DENSE_RANK() OVER (ORDER BY b_node) AS unique_b_node
FROM @x.nodes('/a/b') AS b (b_node)
CROSS APPLY b.b_node.nodes('./c') AS c (c_node)

Published Monday, August 03, 2009 9:43 AM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Alejandro Mesa said:

Adam,

Clever solution, indeed.

I can see in the execution plan, that the segmentation and sequence is done using an id returned by the xml reader.

Here is another way to accomplish this, but is not as good as the one from Mladen. It uses the all time favorite auxiliary table of numbers, and also relies on counting the parent nodes.

declare @Numbers table (

Number int not null unique

);

insert into

@Numbers(Number)

Values

(1),

(2),

(3),

(4),

(5);

declare @x xml;

set @x = '<a><b><c>abc</c><c>def</c></b><b><c>abc</c><c>def</c></b></a>';

select

T.c.value('(./text())[1]', 'varchar(25)') as c,

N.Number

from

@Numbers as N

cross apply

@x.nodes('/a/b[position() = sql:column("N.Number")]/.') as X(n)

cross apply

X.n.nodes('./c') as T(c)

where

N.Number <= @x.query('count(/a/b)').value('.', 'int');

GO

All this could be avoided if we were able to have access to the position or even the id of the node. BTW, here is connect entry in case you feel like it deserves one more vote.

https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=383888&wa=wsignin1.0

Cheers,

AMB

August 4, 2009 9:05 PM
 

Adam Machanic said:

Hi Alejandro,

Voted!

August 4, 2009 10:54 PM
 

Matthew Darwin said:

Thanks very much for this, an excellent solution that's really helped massively.

December 29, 2010 11:54 AM
 

Maciej Raszplewicz said:

Adam,

Is result order of this query with dense_rank always the same? I know that select is non-deterministic but what if I select data from xml?

Maciek

March 22, 2011 9:58 AM
 

Adam Machanic said:

Maciej,

No, order is never guaranteed unless you use ORDER BY -- XML or no. Please use ORDER BY :-)

March 22, 2011 11:14 AM
 

Maciej Raszplewicz said:

I mean order of dense_rank. There is ORDER BY b_node there.

My first question was not exactly what I meant.

March 22, 2011 11:24 AM
 

Serg Yurevich said:

This is very elegant solution. I like it very much and it helped me much. Genius!

March 28, 2013 12:29 PM
 

David S said:

Adam / Mladen - Thank you!!!  Definitely not something I would have come up with myself and saved me hours of banging my head against the wall!  Where do I send the beer? :-)

January 10, 2014 2:16 PM
 

David S said:

Adam / Mladen - Thank you!!!  Definitely not something I would have come up with myself and saved me hours of banging my head against the wall!  Where do I send the beer? :-)

January 10, 2014 2:16 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement