I have an XML File which I am trying to parse. The XML was created through Excel using
Save as XML
Because the XML file was created from Microsoft Excel, it has this header:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
The data I am trying to extract is set up in blocks like this:
<Row ss:AutoFitHeight="0" ss:Height="30">
<Cell ss:StyleID="s22"/>
<Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
<Cell ss:StyleID="s22"/>
</Row>
Right now, my query looks like this:
;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss)
select * from (
select X.value('local-name(.)[1]','varchar(max)') as Name,
X.value('.[1]','varchar(max)') as Value
from @allUsers.nodes('//*') as T(X)
) a
where Name = 'Data'
and gives me these results:
Name Value
---- -----------
Data Jane Doe
Data JaneDoe
Data XYZ
Data (555)555-5555
What I would like to do is separate this into 4 rows, so I have something like:
Name UserName Address Phone
----- ---------- --------- ----------
Jane Doe JaneDoe XYZ (555)-555-5555
I tried selecting a column as
X.value('.[2]','varchar(max)') as UserName
but I just get all NULL
values for that.
Is there any way to do this?
The general structure of the XML file looks like:
<Workbook>
<DocumentProperties>
</DocumentProperties>
<ExcelWorkbook>
</ExcelWorkbook>
<Styles>
<Style>
</Style>
</Styles>
<Worksheet>
<Table>
<Column.../>
<Column.../>
<Column.../>
<Row>
<Cell.../>
<Cell><Data>...</Data></Cell>
<Cell><Data>...</Data></Cell>
<Cell><Data>...</Data></Cell>
<Cell><Data>...</Data></Cell>
<Cell.../>
</Row>
...
</Table>
</Worksheet>
and the information I am trying to get is in the <Data>...</Data>
field
EDIT
From the way I phrased the question, it would seem like the header names are already programmed in, but they are actually read as rows in <Cell><Data><Data/></Cell>
. I am also not sure what purpose the part serves
This is the beginning of the <Row>
section:
<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="2685" x:FullColumns="1"
x:FullRows="1">
<Column ss:AutoFitWidth="0" ss:Width="26.25"/>
<Column ss:AutoFitWidth="0" ss:Width="117" ss:Span="3"/>
<Column ss:Index="6" ss:AutoFitWidth="0" ss:Width="29.25"/>
<Row ss:AutoFitHeight="0" ss:Height="60"> --Contains the header names
<Cell ss:StyleID="s22"/>
<Cell ss:StyleID="s23"><Data ss:Type="String">Name</Data></Cell>
<Cell ss:StyleID="s23"><Data ss:Type="String">UserName</Data></Cell>
<Cell ss:StyleID="s23"><Data ss:Type="String">Address</Data></Cell>
<Cell ss:StyleID="s23"><Data ss:Type="String">Telephone Number</Data></Cell>
<Cell ss:StyleID="s22"/>
</Row>
<Row ss:AutoFitHeight="0" ss:Height="30"> --First record I would like to extract
<Cell ss:StyleID="s22"/>
<Cell ss:StyleID="s24"><Data ss:Type="String">John Smith</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">JSmith</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">ABC</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">(999) 999-9999</Data></Cell>
<Cell ss:StyleID="s22"/>
</Row>