Separating XML values with the same tags into diff

2019-07-04 04:31发布

问题:

I have an XML File which I am trying to parse. The XML was created through Excel using

Save as XML

Because the XML file was created from Microsoft Excel, it has this header:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">

The data I am trying to extract is set up in blocks like this:

<Row ss:AutoFitHeight="0" ss:Height="30">
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>

Right now, my query looks like this:

;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss)

select * from (
select X.value('local-name(.)[1]','varchar(max)') as Name,
X.value('.[1]','varchar(max)') as Value
from @allUsers.nodes('//*') as T(X)
    ) a
where Name  = 'Data'

and gives me these results:

Name    Value
----    -----------
Data    Jane Doe
Data    JaneDoe
Data    XYZ
Data    (555)555-5555

What I would like to do is separate this into 4 rows, so I have something like:

Name      UserName    Address    Phone
-----     ----------  ---------  ----------
Jane Doe  JaneDoe     XYZ        (555)-555-5555

I tried selecting a column as

X.value('.[2]','varchar(max)') as UserName

but I just get all NULL values for that.

Is there any way to do this?

The general structure of the XML file looks like:

<Workbook>
  <DocumentProperties>
  </DocumentProperties>
  <ExcelWorkbook>
  </ExcelWorkbook>
  <Styles>
    <Style>
    </Style>
  </Styles>
  <Worksheet>
    <Table>
      <Column.../>
      <Column.../>
      <Column.../>
      <Row>
        <Cell.../>
        <Cell><Data>...</Data></Cell>
        <Cell><Data>...</Data></Cell>
        <Cell><Data>...</Data></Cell>
        <Cell><Data>...</Data></Cell>
        <Cell.../>
      </Row>
      ...
    </Table>
  </Worksheet>

and the information I am trying to get is in the <Data>...</Data> field

EDIT

From the way I phrased the question, it would seem like the header names are already programmed in, but they are actually read as rows in <Cell><Data><Data/></Cell>. I am also not sure what purpose the part serves

This is the beginning of the <Row> section:

<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="2685" x:FullColumns="1"
   x:FullRows="1">
   <Column ss:AutoFitWidth="0" ss:Width="26.25"/>
   <Column ss:AutoFitWidth="0" ss:Width="117" ss:Span="3"/>
   <Column ss:Index="6" ss:AutoFitWidth="0" ss:Width="29.25"/>
   <Row ss:AutoFitHeight="0" ss:Height="60"> --Contains the header names
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s23"><Data ss:Type="String">Name</Data></Cell>
    <Cell ss:StyleID="s23"><Data ss:Type="String">UserName</Data></Cell>
    <Cell ss:StyleID="s23"><Data ss:Type="String">Address</Data></Cell>
    <Cell ss:StyleID="s23"><Data ss:Type="String">Telephone Number</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>

   <Row ss:AutoFitHeight="0" ss:Height="30"> --First record I would like to extract
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">John Smith</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JSmith</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">ABC</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(999) 999-9999</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>

回答1:

There were two very similar question by the same user. The OP decided to delete one and combine this here and asked me to copy my answer from there to this thread.

Be aware of the xmlns-namespace which must be declared as "DEFAULT":

Simplified your XML, but the idea should be OK...

DECLARE @allUsers XML=
'<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">
 <Worksheet>
 <Table>
   <Row ss:AutoFitHeight="0" ss:Height="30">
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>
   </Table>
 </Worksheet>   
</Workbook>';

;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss
                     ,DEFAULT 'urn:schemas-microsoft-com:office:spreadsheet')
SELECT T.X.value('Cell[1]/Data[1]','varchar(max)') AS DontKnow1
      ,T.X.value('Cell[2]/Data[1]','varchar(max)') AS Name
      ,T.X.value('Cell[3]/Data[1]','varchar(max)') AS UserName
      ,T.X.value('Cell[4]/Data[1]','varchar(max)') AS DontKnow2
      ,T.X.value('Cell[5]/Data[1]','varchar(max)') AS Telephone
      ,T.X.value('Cell[6]/Data[1]','varchar(max)') AS DontKnow3
FROM @allUsers.nodes('/Workbook/Worksheet/Table/Row') as T(X)


回答2:

Try it like this: I added a root element to imply the namespace, you'll have to take it out (also from the XPath), but you can test this in an empty query window by simple copy, paste and execute:

DECLARE @allUsers XML=
'<root xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
   <Row ss:AutoFitHeight="0" ss:Height="30">
    <Cell ss:StyleID="s22"/>
    <Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
    <Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
    <Cell ss:StyleID="s22"/>
   </Row>
</root>';

;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss)
SELECT T.X.value('Cell[1]/Data[1]','varchar(max)') AS DontKnow1
      ,T.X.value('Cell[2]/Data[1]','varchar(max)') AS Name
      ,T.X.value('Cell[3]/Data[1]','varchar(max)') AS UserName
      ,T.X.value('Cell[4]/Data[1]','varchar(max)') AS DontKnow2
      ,T.X.value('Cell[5]/Data[1]','varchar(max)') AS Telephone
      ,T.X.value('Cell[6]/Data[1]','varchar(max)') AS DontKnow3
FROM @allUsers.nodes('/root/Row') as T(X)

EDIT: Your path will be something like /Workbook[1]/Worksheet[1]/Table[1]/Row[1]

Good luck!