I have an XML File which I am trying to parse. The XML was created through Excel using
Save as XML
Because the XML file was created from Microsoft Excel, it has this header:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
The data I am trying to extract is set up in blocks like this:
<Row ss:AutoFitHeight="0" ss:Height="30">
<Cell ss:StyleID="s22"/>
<Cell ss:StyleID="s24"><Data ss:Type="String">Jane Doe</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">JaneDoe</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">XYZ</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">(555) 555-5555</Data></Cell>
<Cell ss:StyleID="s22"/>
</Row>
Right now, my query looks like this:
;WITH XMLNAMESPACES ('urn:schemas-microsoft-com:office:spreadsheet' as ss)
select * from (
select X.value('local-name(.)[1]','varchar(max)') as Name,
X.value('.[1]','varchar(max)') as Value
from @allUsers.nodes('//*') as T(X)
) a
where Name = 'Data'
and gives me these results:
Name Value
---- -----------
Data Jane Doe
Data JaneDoe
Data XYZ
Data (555)555-5555
What I would like to do is separate this into 4 rows, so I have something like:
Name UserName Address Phone
----- ---------- --------- ----------
Jane Doe JaneDoe XYZ (555)-555-5555
I tried selecting a column as
X.value('.[2]','varchar(max)') as UserName
but I just get all NULL
values for that.
Is there any way to do this?
The general structure of the XML file looks like:
<Workbook>
<DocumentProperties>
</DocumentProperties>
<ExcelWorkbook>
</ExcelWorkbook>
<Styles>
<Style>
</Style>
</Styles>
<Worksheet>
<Table>
<Column.../>
<Column.../>
<Column.../>
<Row>
<Cell.../>
<Cell><Data>...</Data></Cell>
<Cell><Data>...</Data></Cell>
<Cell><Data>...</Data></Cell>
<Cell><Data>...</Data></Cell>
<Cell.../>
</Row>
...
</Table>
</Worksheet>
and the information I am trying to get is in the <Data>...</Data>
field
EDIT
From the way I phrased the question, it would seem like the header names are already programmed in, but they are actually read as rows in <Cell><Data><Data/></Cell>
. I am also not sure what purpose the part serves
This is the beginning of the <Row>
section:
<Table ss:ExpandedColumnCount="6" ss:ExpandedRowCount="2685" x:FullColumns="1"
x:FullRows="1">
<Column ss:AutoFitWidth="0" ss:Width="26.25"/>
<Column ss:AutoFitWidth="0" ss:Width="117" ss:Span="3"/>
<Column ss:Index="6" ss:AutoFitWidth="0" ss:Width="29.25"/>
<Row ss:AutoFitHeight="0" ss:Height="60"> --Contains the header names
<Cell ss:StyleID="s22"/>
<Cell ss:StyleID="s23"><Data ss:Type="String">Name</Data></Cell>
<Cell ss:StyleID="s23"><Data ss:Type="String">UserName</Data></Cell>
<Cell ss:StyleID="s23"><Data ss:Type="String">Address</Data></Cell>
<Cell ss:StyleID="s23"><Data ss:Type="String">Telephone Number</Data></Cell>
<Cell ss:StyleID="s22"/>
</Row>
<Row ss:AutoFitHeight="0" ss:Height="30"> --First record I would like to extract
<Cell ss:StyleID="s22"/>
<Cell ss:StyleID="s24"><Data ss:Type="String">John Smith</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">JSmith</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">ABC</Data></Cell>
<Cell ss:StyleID="s24"><Data ss:Type="String">(999) 999-9999</Data></Cell>
<Cell ss:StyleID="s22"/>
</Row>
Try it like this: I added a root element to imply the namespace, you'll have to take it out (also from the XPath), but you can test this in an empty query window by simple copy, paste and execute:
EDIT: Your path will be something like
/Workbook[1]/Worksheet[1]/Table[1]/Row[1]
Good luck!
There were two very similar question by the same user. The OP decided to delete one and combine this here and asked me to copy my answer from there to this thread.
Be aware of the xmlns-namespace which must be declared as "DEFAULT":
Simplified your XML, but the idea should be OK...