THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Peter DeBetta's SQL Programming Blog

Peter DeBetta's blog about programming in SQL Server using technologies such as T-SQL, .NET, CLR, C#, VB, Visual Studio, and SQL Server Management Studio.

Composing XML from HIERARCHYID in T-SQL

Now that I've discussed converting XML into a set of HIERARCHYID values I thought I'd try to reverse the process...

Not sure if you’ve attempted to convert a table with HIERARCHYID to an XML representation, and if you have, I’m sure you’ve experienced the same woes as me. Sure, I could have taken the route of using C# to create the XML, and it very well may be a better way to make such a conversion; instead I decided that I had to be able to do this in T-SQL, and so began the journey to find such a solution...

Since the XML modify method can only insert into a single node in an XML document, I had to either attempt to generate a string representation of the xml form the data (no simple task) or I could cursor through the data one row at a time (yes, cursor) and insert each node. For this implementation, I choose the cursor method simply because it would be easier to do than to recreate the XML document abilities akin to what is done in the .NET framework.

When using XQuery to insert nodes, you must use a static value (in other words, you cannot use a composed string variable for the XQuery in the XML modify method). At first, this made it difficult to figure out how to insert a node into another node since there was no point of reference.

And so thought that I could use the HIERARCHID’s ToString() method to figure out node positions in the XML, but that quickly was discarded after realizing that the path representation is guaranteed to be neither consecutive nor integers, and it would require using sp_executesql, which I also wanted to avoid.

Then I thought that the data would probably be uniquely identifiable, and so I could use that “id” to add an attribute to every node that I constructed and then cursor through and insert into the node that matched the parent id of the node I was inserting, which removed the need to use sp_executesql. In other words, I would create a cursor that contained the parent node ID and concatenated values from the row of data to create the node with an “id” attribute.


CAST ('<' + NodeName + ' id="' CAST(NodeID AS VARCHAR(20)) + '">' + ISNULL(NodeText, '') + '</' + NodeName + '>' AS XML) AS XmlToInsert

 

I would then iterate through the cursor and insert the node as follows:


SET
@XR.modify('insert sql:variable("@xcol") into (//*[@id=sql:variable("@hparentid")])[1]')

This uses the sql:variable extension to find a node via a relative reference. It looks for the node, regardless of location in the XML, with the attribute" “id” equal to the “id” of the parent which is contained in the @hparent variable.

Alas, this could become more problematic if the unique key contained multiple fields. One also might not want to include an extra “id” attribute in the results. Because of these and other things that I realized could go wrong with this implementation, I decided to scrap it and moved on. And although the version I am about to present has its own potential for issues, I felt it was more flexible and cleaner in its approach. Essentially what I decided to do is to use a temp table that contained the generated the XML node, the original HIERARCHYID value, a row number, generated with ROW_NUMBER() ordered by the hierarchy order, and a parent row number, which would initially set to 0 then updated using a self join on the temp table.

Then since the XML nodes position will match the generated row number based on the HIERARCHYID position, we can simply insert the new node into the parent node based on its position.


-- Sample Data to test with
CREATE TABLE
#HTable (NodeName sysname, Attributes xml, NodeText VARCHAR(MAX), HierarchyNode HIERARCHYID)
INSERT INTO #HTable (NodeName, Attributes, NodeText, HierarchyNode)
VALUES 
  
('a', '<a attr="1" />', NULL, 0x),
   (
'b', NULL, NULL, 0x58),
   (
'c', '<a xyz="3" />', 'abc', 0x5AC0),
   (
'c', NULL, 'def', 0x5B40),
   (
'b', '<a id="111" pid="1234" />', NULL, 0x68),
   (
'c', NULL, 'abc', 0x6AC0),
   (
'c', NULL, 'def', 0x6B40)


CREATE TABLE #T (XmlToInsert XML, HierarchyNode HIERARCHYID, RowNum INT, ParentRowNum INT)

-- INSERT the generated XML node, the original HIERARCHYID, a unique row number, and a parent row number (set to 0)
INSERT INTO #T (XmlToInsert, HierarchyNode, RowNum, ParentRowNum
SELECT 
  
CAST(
          
'<' + NodeName + ' '
          
+ CASE WHEN Attributes IS NOT NULL 
              
THEN SUBSTRING(CAST(Attributes AS VARCHAR(MAX)), 3, LEN(CAST(Attributes AS VARCHAR(MAX))) - 4)
              
ELSE '' END
          
+ '>' + ISNULL(NodeText, '') + '</' + NodeName + '>'
      
AS XML) AS XmlToInsert
  
, HierarchyNode 
  
, ROW_NUMBER() OVER (ORDER BY HierarchyNode) AS RowNum
  
, 0 AS ParentRowNum
FROM #HTable
ORDER BY HierarchyNode

-- UPDATE the parent row number using the HIERARCHYID method GetAncestor in the self join
-- If the amount of data is great, an index could be created on the temp table prior to the update

UPDATE T1
SET T1.ParentRowNum = T2.RowNum
FROM #T AS T1
  
INNER JOIN #T AS T2 ON T2.HierarchyNode = T1.HierarchyNode.GetAncestor(1)

DECLARE @xcol XML, @parentrownum INT, @flag BIT = 0, @XR XML = ''

-- We actually only need the generated XML and the parent row number to do the rest of this work
DECLARE crH CURSOR READ_ONLY FOR SELECT XmlToInsert, ParentRowNum FROM #T ORDER BY RowNum
   
OPEN crH

FETCH NEXT FROM crH INTO @xcol, @parentrownum
WHILE(@@FETCH_STATUS = 0)
BEGIN
       -- First time through, we add a root node
       IF
@flag = 0
          
SET @XR.modify('insert sql:variable("@xcol") into (/)[1]')
      
ELSE -- Subsequent passes we find the parent node by position
           SET
@XR.modify('insert sql:variable("@xcol") into (//*)[sql:variable("@parentrownum")][1]')
       
      
SET @flag = 1
      
FETCH NEXT FROM crH INTO @xcol, @parentrownum
END
CLOSE
crH
DEALLOCATE crH
DROP TABLE #T
DROP TABLE #HTable
SELECT @xr

I did try one other version that used a recursive CTE to generate the list of nodes and their respective “rownum” and “parentRowNum”:


;
WITH H1 AS
SELECT  
     
NodeName
     
, Attributes
     
, NodeText
     
, HierarchyNode  
     
, ROW_NUMBER() OVER (ORDER BY HierarchyNode) AS RowNum 
  
FROM #HTable AS HT
)
,

AS
SELECT  
     
CAST
             
'<' + NodeName + ' ' 
             
+ CASE WHEN Attributes IS NOT NULL  
                 
THEN SUBSTRING(CAST(Attributes AS VARCHAR(MAX)), 3, LEN(CAST(Attributes AS VARCHAR(MAX))) - 4
                 
ELSE '' END 
             
+ '>' + ISNULL(NodeText, '') + '</' + NodeName + '>' 
         
AS XML) AS XmlToInsert 
     
, HierarchyNode  
     
, RowNum
     
, CAST(0 AS BIGINT) AS ParentRowNum 
  
FROM H1 AS HT
  
WHERE HierarchyNode = HIERARCHYID::GetRoot()
   
  
UNION ALL
   
  
SELECT  
     
CAST
             
'<' + HT.NodeName + ' ' 
             
+ CASE WHEN HT.Attributes IS NOT NULL  
                 
THEN SUBSTRING(CAST(HT.Attributes AS VARCHAR(MAX)), 3, LEN(CAST(HT.Attributes AS VARCHAR(MAX))) - 4
                 
ELSE '' END 
             
+ '>' + ISNULL(HT.NodeText, '') + '</' + HT.NodeName + '>' 
         
AS XML)
      ,
HT.HierarchyNode  
     
, HT.RowNum
     
, H.RowNum 
  
FROM H1 AS HT
      
INNER JOIN H ON H.HierarchyNode = HT.HierarchyNode.GetAncestor(1
)
INSERT INTO #T (XmlToInsert, HierarchyNode, RowNum, ParentRowNum)  
SELECT XmlToInsert, HierarchyNode, RowNum, ParentRowNum
FROM H
ORDER BY HierarchyNode

Instead of inserting the initial values into the temp table #T, this generates the completed set of data and then inserts it into #T. However, regardless of what variation I tried (e.g. add indexes), the initial method I show that inserts then updates #T was always faster.

Please let me know if you have any ideas that might optimize this, and if you have an implementation (T-SQL or .NET), please share.

Published Wednesday, February 17, 2010 4:15 PM by Peter W. DeBetta

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Peter W. DeBetta said:

Well I finally got around to testing with a larger set of data (9000 rows) and the inserts into the XML get logarithmically slower as you process through the cursor.

For about 1000 rows or less, it works fine. For up to 2000 rows, much slower, and after that, it's intolerable.

February 17, 2010 10:54 PM

Leave a Comment

(required) 
(required) 
Submit

About Peter W. DeBetta

Peter DeBetta works for Microsoft as an Application Platform Technology Strategist, specializing in strategies, design, implementation, and deployment of Microsoft SQL Server and .NET solutions. Peter writes courseware, articles, and books – most recently the title Introducing SQL Server 2008 from Microsoft Press. Peter speaks at conferences around the world, including TechEd, SQL PASS Community Summit, DevTeach, SQL Connections, DevWeek, and VSLive!

When Peter isn’t working, you can find him singing and playing guitar (click here to hear an original song by Peter), taking pictures, or simply enjoying life with his wife, son, and daughter.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement