The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame
with pandas.read_excel(file)
command. Under the hood, it uses xlrd library which does not support ods files.
Is there an equivalent of pandas.read_excel
for ods files? If not, how can I do the same for an Open Document Formatted spreadsheet (ods file)? ODF is used by LibreOffice and OpenOffice.
Here is a quick and dirty hack which uses ezodf module:
Test:
NOTES:
header
,skiprows
,index_col
,parse_cols
are NOT implemented in this function - feel free to update this question if you want to implement themezodf
depends onlxml
make sure you have it installedYou can read ODF (Open Document Format
.ods
) documents in Python using the following modules:Using ezodf, a simple ODS-to-DataFrame converter could look like this:
P.S.
ODF spreadsheet (*.ods files) support has been requested on the
pandas
issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.ezodf
was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in thispandas
fork.It seems the answer is No! And I would characterize the tools to read in ODS still ragged. If you're on POSIX, maybe the strategy of exporting to xlsx on the fly before using Pandas' very nice importing tools for xlsx is an option:
Altogether, my code looks like:
Here fileOlderThan() is a function (see http://github.com/cpbl/cpblUtilities) which returns true if tmp.xlsx does not exist or is older than the .ods file.
If possible, save as CSV from the spreadsheet application and then use
pandas.read_csv()
. IIRC, an 'ods' spreadsheet file actually is an XML file which also contains quite some formatting information. So, if it's about tabular data, extract this raw data first to an intermediate file (CSV, in this case), which you can then parse with other programs, such as Python/pandas.There is support for reading Excel files in Pandas (both xls and xlsx), see the read_excel command. You can use OpenOffice to save the spreadsheet as xlsx. The conversion can also be done automatically on the command line, apparently, using the convert-to command line parameter.
Reading the data from xlsx avoids some of the issues (date formats, number formats, unicode) that you may run into when you convert to CSV first.
This is available natively in pandas 0.25. So long as you have odfpy installed you can do