Shredding XML From Execution Plans

2020-01-29 17:27发布

问题:

I'll preface this by saying that I hate XML, horrible stuff to work with, but necessary sometimes.

My current issue is that I'm trying to take the XML from an execution plan (supplied by a user, so could be any size) and shred this into a table for further manipulation. I'm down to two options at the moment;

  1. I could work out the maximum amount of nodes available for an execution plan (I suspect this would be a lot) and create the whole script that could be used for any XML input. This would be a one time thing so not an issue.
  2. The alternative would be to work out the number of nodes dynamically and create the output as per the requirements.

Has anybody done a similar exercise in the past? All of the sample queries I've found have known the output fields already.

回答1:

A very straight way could be this (while @x is your XML-execution-plan):

DECLARE @x XML=
N'<root>
    <ElementE1 AttributA1="A1-text belongs to E1[1]" OneMore="xyz">E1-Text 2</ElementE1>
    <ElementE1 AttributA1="A1-text belongs to E1[2]">E1-Text 2</ElementE1>
    <ElementParent>
      <subElement test="sub"/>
      Free text
    </ElementParent>
  </root>';

DECLARE @idoc INT;
EXEC sp_xml_preparedocument @idoc OUTPUT, @x;   
SELECT * FROM OPENXML (@idoc, '*');   
EXEC sp_xml_removedocument @idoc;  

The result (not all columns)

+----+----------+----------+--------------+------+--------------------------+
| id | parentid | nodetype | localname    | prev | text                     |
+----+----------+----------+--------------+------+--------------------------+
| 0  | NULL     | 1        | root         | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 2  | 0        | 1        | ElementE1    | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 3  | 2        | 2        | AttributA1   | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 13 | 3        | 3        | #text        | NULL | A1-text belongs to E1[1] |
+----+----------+----------+--------------+------+--------------------------+
| 4  | 2        | 2        | OneMore      | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 14 | 4        | 3        | #text        | NULL | xyz                      |
+----+----------+----------+--------------+------+--------------------------+
| 5  | 2        | 3        | #text        | NULL | E1-Text 2                |
+----+----------+----------+--------------+------+--------------------------+
| 6  | 0        | 1        | ElementE1    | 2    | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 7  | 6        | 2        | AttributA1   | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 15 | 7        | 3        | #text        | NULL | A1-text belongs to E1[2] |
+----+----------+----------+--------------+------+--------------------------+
| 8  | 6        | 3        | #text        | NULL | E1-Text 2                |
+----+----------+----------+--------------+------+--------------------------+
| 9  | 0        | 1        | ElementParent| 6    | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 10 | 9        | 1        | subElement   | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 11 | 10       | 2        | test         | NULL | NULL                     |
+----+----------+----------+--------------+------+--------------------------+
| 16 | 11       | 3        | #text        | NULL | sub                      |
+----+----------+----------+--------------+------+--------------------------+
| 12 | 9        | 3        | #text        | 10   | Free text                |
+----+----------+----------+--------------+------+--------------------------+

The id shows clearly, that the algorithm is breadth first, there is no id=1 (why ever) and the nodetype allows to distinguish between elements, attributs and (floating) text. The prev column points to a sibling up in the chain. The missing columns are related to namespaces...

The approach with FROM OPENXML is outdated, but this is one of the rare situations it might still be very usefull...

You get a list with IDs and ParentIDs you might query with an recursive CTE... This depends on what you want to do with this afterwards...



回答2:

The script to shred XML down to elements and attributes with their paths and parents. From http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx

CREATE FUNCTION [dbo].[XMLTable]( 
    @x XML 
) 
RETURNS TABLE 
AS RETURN 
/*---------------------------------------------------------------------- 
This INLINE TVF uses a recursive CTE that processes each element and 
attribute of the XML document passed in. 
----------------------------------------------------------------------*/ 
WITH cte AS ( 
    /*------------------------------------------------------------------ 
    Anchor part of the recursive query. Retrieves the root element 
    of the XML document 
    ------------------------------------------------------------------*/ 
    SELECT 
        1 AS lvl, 
        x.value('local-name(.)','NVARCHAR(MAX)') AS Name, 
        CAST(NULL AS NVARCHAR(MAX)) AS ParentName,
        CAST(1 AS INT) AS ParentPosition,
        CAST(N'Element' AS NVARCHAR(20)) AS NodeType, 
        x.value('local-name(.)','NVARCHAR(MAX)') AS FullPath, 
        x.value('local-name(.)','NVARCHAR(MAX)') 
            + N'[' 
            + CAST(ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS NVARCHAR) 
            + N']' AS XPath, 
        ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS Position,
        x.value('local-name(.)','NVARCHAR(MAX)') AS Tree, 
        x.value('text()[1]','NVARCHAR(MAX)') AS Value, 
        x.query('.') AS this,        
        x.query('*') AS t, 
        CAST(CAST(1 AS VARBINARY(4)) AS VARBINARY(MAX)) AS Sort, 
        CAST(1 AS INT) AS ID 
    FROM @x.nodes('/*') a(x) 
    UNION ALL 
    /*------------------------------------------------------------------ 
    Start recursion. Retrieve each child element of the parent node 
    ------------------------------------------------------------------*/ 
    SELECT 
        p.lvl + 1 AS lvl, 
        c.value('local-name(.)','NVARCHAR(MAX)') AS Name, 
        CAST(p.Name AS NVARCHAR(MAX)) AS ParentName,
        CAST(p.Position AS INT) AS ParentPosition,
        CAST(N'Element' AS NVARCHAR(20)) AS NodeType, 
        CAST( 
            p.FullPath 
            + N'/' 
            + c.value('local-name(.)','NVARCHAR(MAX)') AS NVARCHAR(MAX) 
        ) AS FullPath, 
        CAST( 
            p.XPath 
            + N'/' 
            + c.value('local-name(.)','NVARCHAR(MAX)') 
            + N'[' 
            + CAST(ROW_NUMBER() OVER(
                PARTITION BY c.value('local-name(.)','NVARCHAR(MAX)')
                ORDER BY (SELECT 1)) AS NVARCHAR    ) 
            + N']' AS NVARCHAR(MAX) 
        ) AS XPath, 
        ROW_NUMBER() OVER(
                PARTITION BY c.value('local-name(.)','NVARCHAR(MAX)')
                ORDER BY (SELECT 1)) AS Position,
        CAST( 
            SPACE(2 * p.lvl - 1) + N'|' + REPLICATE(N'-', 1)
            + c.value('local-name(.)','NVARCHAR(MAX)') AS NVARCHAR(MAX) 
        ) AS Tree, 
        CAST( c.value('text()[1]','NVARCHAR(MAX)') AS NVARCHAR(MAX) ) AS Value, 
        c.query('.') AS this,        
        c.query('*') AS t, 
        CAST( 
            p.Sort 
            + CAST( (lvl + 1) * 1024 
            + (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) * 2) AS VARBINARY(4) 
        ) AS VARBINARY(MAX) ) AS Sort, 
        CAST( 
            (lvl + 1) * 1024 
            + (ROW_NUMBER() OVER(ORDER BY (SELECT 1)) * 2) AS INT 
        ) 
    FROM cte p 
    CROSS APPLY p.t.nodes('*') b(c)        
), cte2 AS ( 
    SELECT 
        lvl AS Depth, 
        Name AS NodeName, 
        ParentName,
        ParentPosition,
        NodeType, 
        FullPath, 
        XPath, 
        Position,
        Tree AS TreeView, 
        Value, 
        this AS XMLData, 
        Sort, ID 
    FROM cte 
    UNION ALL 
    /*------------------------------------------------------------------ 
    Attributes do not need recursive calls. So add the attributes 
    to the query output at the end. 
    ------------------------------------------------------------------*/ 
    SELECT 
        p.lvl, 
        x.value('local-name(.)','NVARCHAR(MAX)'), 
        p.Name,
        p.Position,
        CAST(N'Attribute' AS NVARCHAR(20)), 
        p.FullPath + N'/@' + x.value('local-name(.)','NVARCHAR(MAX)'), 
        p.XPath + N'/@' + x.value('local-name(.)','NVARCHAR(MAX)'), 
        1,
        SPACE(2 * p.lvl - 1) + N'|' + REPLICATE('-', 1) 
            + N'@' + x.value('local-name(.)','NVARCHAR(MAX)'), 
        x.value('.','NVARCHAR(MAX)'), 
        NULL, 
        p.Sort, 
        p.ID + 1 
    FROM cte p 
    CROSS APPLY this.nodes('/*/@*') a(x) 
) 
SELECT 
    ROW_NUMBER() OVER(ORDER BY Sort, ID) AS ID, 
    ParentName, ParentPosition,Depth, NodeName, Position,  
    NodeType, FullPath, XPath, TreeView, Value, XMLData
FROM cte2;
go
SELECT * FROM dbo.XMLTable(' 
<employees> 
    <emp name="jacob"/> 
    <emp name="steve"> 
        <phone>123</phone>
     some text                      
    </emp> 
</employees> 
')