I am trying to convert a excel xlm file into csv (or excel) format, could anyone suggest how to do it by python? I tried to search in this web for answer, but all answer using xlst, but I am a fresh man to python, could you please demonstrate it in detail?
Thank you. Here is the file opened by textpad tools:
<?xml version="1.0" encoding="UTF-8"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
<DownloadComponents/>
<LocationOfComponents HRef="file:///\"/>
</OfficeDocumentSettings>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>12525</WindowHeight>
<WindowWidth>15195</WindowWidth>
<WindowTopX>480</WindowTopX>
<WindowTopY>120</WindowTopY>
<ActiveSheet>0</ActiveSheet>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="bold">
<Font ss:Bold="1" />
</Style>
<Style ss:ID="percent">
<NumberFormat ss:Format="Percent" />
</Style>
<Style ss:ID="currency">
<NumberFormat ss:Format="Currency" />
</Style>
<Style ss:ID="header">
<Font ss:Color="#000000" ss:Bold="1" />
<Alignment ss:WrapText="1" ss:Horizontal="Center" ss:Vertical="Center" />
</Style>
</Styles>
<Worksheet ss:Name="SABR Data"><Table ss:ExpandedColumnCount="33" ss:ExpandedRowCount="1265" x:FullColumns="1" x:FullRows="1">
<Column ss:Index="1" ss:Width="150" />
<Column ss:Index="2" ss:Width="150" />
<Column ss:Index="3" ss:Width="150" />
<Column ss:Index="4" ss:Width="150" />
<Column ss:Index="5" ss:Width="150" />
<Column ss:Index="6" ss:Width="150" />
<Column ss:Index="7" ss:Width="150" />
<Column ss:Index="8" ss:Width="150" />
<Column ss:Index="9" ss:Width="150" />
<Column ss:Index="10" ss:Width="150" />
<Column ss:Index="11" ss:Width="150" />
<Column ss:Index="12" ss:Width="150" />
<Column ss:Index="13" ss:Width="150" />
<Column ss:Index="14" ss:Width="150" />
<Column ss:Index="15" ss:Width="150" />
<Column ss:Index="16" ss:Width="150" />
<Column ss:Index="17" ss:Width="150" />
<Column ss:Index="18" ss:Width="150" />
<Column ss:Index="19" ss:Width="150" />
<Column ss:Index="20" ss:Width="150" />
<Column ss:Index="21" ss:Width="150" />
<Column ss:Index="22" ss:Width="150" />
<Column ss:Index="23" ss:Width="150" />
<Column ss:Index="24" ss:Width="150" />
<Column ss:Index="25" ss:Width="150" />
<Column ss:Index="26" ss:Width="150" />
<Column ss:Index="27" ss:Width="150" />
<Column ss:Index="28" ss:Width="150" />
<Column ss:Index="29" ss:Width="150" />
<Column ss:Index="30" ss:Width="150" />
<Column ss:Index="31" ss:Width="150" />
<Column ss:Index="32" ss:Width="150" />
<Column ss:Index="33" ss:Width="150" />
<Row>
<Cell ss:StyleID="header"><Data ss:Type="String">Site</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Segment</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Country</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Day</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Week</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Month</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Quarter</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Finance Region</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Booked Order USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Sales Calls - Total</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Booked Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">AOV</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Revenue Per Call</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Price Match Order %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Deal Closer Order %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Price Match USD %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">CPU Hero USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">BTB True Attach CPU USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">iPad Hero USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Sales Calls - Queue</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Watch Hero USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">BTB True Attach Watch USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">CPU CTO %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Personalized iPad %</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Sales Calls - RETAIL</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate CPU Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate iPad Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate iPhone Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Close Rate Watch Hero Orders</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">BTB True Attach USD</Data></Cell>
<Cell ss:StyleID="header"><Data ss:Type="String">Booked Hero Orders</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="SABR Settings"><Table ss:ExpandedColumnCount="5" ss:ExpandedRowCount="21" x:FullColumns="1" x:FullRows="1">
<Column ss:Index="1" ss:Width="150" />
<Column ss:Index="2" ss:Width="150" />
<Column ss:Index="3" ss:Width="150" />
<Column ss:Index="4" ss:Width="150" />
<Column ss:Index="5" ss:Width="150" />
<Row>
<Cell ss:StyleID="header"><Data ss:Type="String">Report URL</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>
For starters, the Excel 2014 XML format is, well, XML. In order to parse that, you need to read up on how to process XML with Python.
To write out to a CSV file, you can use Python's CSV module.
I'm not going to do the whole program for you - StackOverflow is not that kind of site - but to get you started:
Load the XML data to Python using:
These are in the worksheet 'SABR Data'. There is another one called 'SABR Settings', which I'm not going to bother with. Note that you need to add the
namespace
prefix when looking for any node. You can do that more easily with a dictionary, see this example. So first, let's get the node for the worksheet we want:You can use XPATH to find it more easily with
findall
.The headers of the columns are clearly on the XML nodes
<Cell ss:StyleID="header">...</Cell>
. You can get each of these by doing something similar with the.iter
as used to find the worksheet, and then nested loops on the correct nodes. TIP: look at the tag name and use the node's.text
attribute to get the text in the<Data>
elements.Then repeat similarly for the actual data rows.
Then write out the headers and data to a CSV file.
Also, if you are this fresh to Python, I highly recommend you read a few tutorials before trying something this complex.