Now that I've discussed converting XML into a set of HIERARCHYID values I thought I'd try to reverse the process...
Not sure if you’ve attempted to convert a table with HIERARCHYID to an XML representation, and if you have, I’m sure you’ve experienced the same woes as me. Sure, I could have taken the route of using C# to create the XML, and it very well may be a better way to make such a conversion; instead I decided that I had to be able to do this in T-SQL, and so began the journey to find such a solution...
Since the XML modify method can only insert into a single node in an XML document, I had to either attempt to generate a string representation of the xml form the data (no simple task) or I could cursor through the data one row at a time (yes, cursor) and insert each node. For this implementation, I choose the cursor method simply because it would be easier to do than to recreate the XML document abilities akin to what is done in the .NET framework.
When using XQuery to insert nodes, you must use a static value (in other words, you cannot use a composed string variable for the XQuery in the XML modify method). At first, this made it difficult to figure out how to insert a node into another node since there was no point of reference.
And so thought that I could use the HIERARCHID’s ToString() method to figure out node positions in the XML, but that quickly was discarded after realizing that the path representation is guaranteed to be neither consecutive nor integers, and it would require using sp_executesql, which I also wanted to avoid.
Then I thought that the data would probably be uniquely identifiable, and so I could use that “id” to add an attribute to every node that I constructed and then cursor through and insert into the node that matched the parent id of the node I was inserting, which removed the need to use sp_executesql. In other words, I would create a cursor that contained the parent node ID and concatenated values from the row of data to create the node with an “id” attribute.
CAST ('<' + NodeName + ' id="' CAST(NodeID AS VARCHAR(20)) + '">' + ISNULL(NodeText, '') + '</' + NodeName + '>' AS XML) AS XmlToInsert
I would then iterate through the cursor and insert the node as follows:
SET @XR.modify('insert sql:variable("@xcol") into (//*[@id=sql:variable("@hparentid")])[1]')
This uses the sql:variable extension to find a node via a relative reference. It looks for the node, regardless of location in the XML, with the attribute" “id” equal to the “id” of the parent which is contained in the @hparent variable.
Alas, this could become more problematic if the unique key contained multiple fields. One also might not want to include an extra “id” attribute in the results. Because of these and other things that I realized could go wrong with this implementation, I decided to scrap it and moved on. And although the version I am about to present has its own potential for issues, I felt it was more flexible and cleaner in its approach. Essentially what I decided to do is to use a temp table that contained the generated the XML node, the original HIERARCHYID value, a row number, generated with ROW_NUMBER() ordered by the hierarchy order, and a parent row number, which would initially set to 0 then updated using a self join on the temp table.
Then since the XML nodes position will match the generated row number based on the HIERARCHYID position, we can simply insert the new node into the parent node based on its position.
-- Sample Data to test with
CREATE TABLE #HTable (NodeName sysname, Attributes xml, NodeText VARCHAR(MAX), HierarchyNode HIERARCHYID)
INSERT INTO #HTable (NodeName, Attributes, NodeText, HierarchyNode)
VALUES
('a', '<a attr="1" />', NULL, 0x),
('b', NULL, NULL, 0x58),
('c', '<a xyz="3" />', 'abc', 0x5AC0),
('c', NULL, 'def', 0x5B40),
('b', '<a id="111" pid="1234" />', NULL, 0x68),
('c', NULL, 'abc', 0x6AC0),
('c', NULL, 'def', 0x6B40)
CREATE TABLE #T (XmlToInsert XML, HierarchyNode HIERARCHYID, RowNum INT, ParentRowNum INT)
-- INSERT the generated XML node, the original HIERARCHYID, a unique row number, and a parent row number (set to 0)
INSERT INTO #T (XmlToInsert, HierarchyNode, RowNum, ParentRowNum)
SELECT
CAST(
'<' + NodeName + ' '
+ CASE WHEN Attributes IS NOT NULL
THEN SUBSTRING(CAST(Attributes AS VARCHAR(MAX)), 3, LEN(CAST(Attributes AS VARCHAR(MAX))) - 4)
ELSE '' END
+ '>' + ISNULL(NodeText, '') + '</' + NodeName + '>'
AS XML) AS XmlToInsert
, HierarchyNode
, ROW_NUMBER() OVER (ORDER BY HierarchyNode) AS RowNum
, 0 AS ParentRowNum
FROM #HTable
ORDER BY HierarchyNode
-- UPDATE the parent row number using the HIERARCHYID method GetAncestor in the self join
-- If the amount of data is great, an index could be created on the temp table prior to the update
UPDATE T1
SET T1.ParentRowNum = T2.RowNum
FROM #T AS T1
INNER JOIN #T AS T2 ON T2.HierarchyNode = T1.HierarchyNode.GetAncestor(1)
DECLARE @xcol XML, @parentrownum INT, @flag BIT = 0, @XR XML = ''
-- We actually only need the generated XML and the parent row number to do the rest of this work
DECLARE crH CURSOR READ_ONLY FOR SELECT XmlToInsert, ParentRowNum FROM #T ORDER BY RowNum
OPEN crH
FETCH NEXT FROM crH INTO @xcol, @parentrownum
WHILE(@@FETCH_STATUS = 0)
BEGIN
-- First time through, we add a root node
IF @flag = 0
SET @XR.modify('insert sql:variable("@xcol") into (/)[1]')
ELSE -- Subsequent passes we find the parent node by position
SET @XR.modify('insert sql:variable("@xcol") into (//*)[sql:variable("@parentrownum")][1]')
SET @flag = 1
FETCH NEXT FROM crH INTO @xcol, @parentrownum
END
CLOSE crH
DEALLOCATE crH
DROP TABLE #T
DROP TABLE #HTable
SELECT @xr
I did try one other version that used a recursive CTE to generate the list of nodes and their respective “rownum” and “parentRowNum”:
;WITH H1 AS
( SELECT
NodeName
, Attributes
, NodeText
, HierarchyNode
, ROW_NUMBER() OVER (ORDER BY HierarchyNode) AS RowNum
FROM #HTable AS HT
)
, H
AS
( SELECT
CAST(
'<' + NodeName + ' '
+ CASE WHEN Attributes IS NOT NULL
THEN SUBSTRING(CAST(Attributes AS VARCHAR(MAX)), 3, LEN(CAST(Attributes AS VARCHAR(MAX))) - 4)
ELSE '' END
+ '>' + ISNULL(NodeText, '') + '</' + NodeName + '>'
AS XML) AS XmlToInsert
, HierarchyNode
, RowNum
, CAST(0 AS BIGINT) AS ParentRowNum
FROM H1 AS HT
WHERE HierarchyNode = HIERARCHYID::GetRoot()
UNION ALL
SELECT
CAST(
'<' + HT.NodeName + ' '
+ CASE WHEN HT.Attributes IS NOT NULL
THEN SUBSTRING(CAST(HT.Attributes AS VARCHAR(MAX)), 3, LEN(CAST(HT.Attributes AS VARCHAR(MAX))) - 4)
ELSE '' END
+ '>' + ISNULL(HT.NodeText, '') + '</' + HT.NodeName + '>'
AS XML)
, HT.HierarchyNode
, HT.RowNum
, H.RowNum
FROM H1 AS HT
INNER JOIN H ON H.HierarchyNode = HT.HierarchyNode.GetAncestor(1)
)
INSERT INTO #T (XmlToInsert, HierarchyNode, RowNum, ParentRowNum)
SELECT XmlToInsert, HierarchyNode, RowNum, ParentRowNum
FROM H
ORDER BY HierarchyNode
Instead of inserting the initial values into the temp table #T, this generates the completed set of data and then inserts it into #T. However, regardless of what variation I tried (e.g. add indexes), the initial method I show that inserts then updates #T was always faster.
Please let me know if you have any ideas that might optimize this, and if you have an implementation (T-SQL or .NET), please share.