How to convert OpenDocument spreadsheets to a pand

The Python library pandas can read Excel spreadsheets and convert them to a pandas.DataFrame with pandas.read_excel(file) command. Under the hood, it uses xlrd library which does not support ods files.

Is there an equivalent of pandas.read_excel for ods files? If not, how can I do the same for an Open Document Formatted spreadsheet (ods file)? ODF is used by LibreOffice and OpenOffice.

标签： python pandas libreoffice dataframe opendocument

10条回答

迷人小祖宗

2楼-- · 2020-02-17 00:50

Here is a quick and dirty hack which uses ezodf module:

import pandas as pd
import ezodf

def read_ods(filename, sheet_no=0, header=0):
    tab = ezodf.opendoc(filename=filename).sheets[sheet_no]
    return pd.DataFrame({col[header].value:[x.value for x in col[header+1:]]
                         for col in tab.columns()})

Test:

In [92]: df = read_ods(filename='fn.ods')

In [93]: df
Out[93]:
     a    b    c
0  1.0  2.0  3.0
1  4.0  5.0  6.0
2  7.0  8.0  9.0

NOTES:

all other useful parameters like header, skiprows, index_col, parse_cols are NOT implemented in this function - feel free to update this question if you want to implement them
ezodf depends on lxml make sure you have it installed

0人赞添加讨论(0) 举报

Viruses.

3楼-- · 2020-02-17 00:55

You can read ODF (Open Document Format .ods) documents in Python using the following modules:

Using ezodf, a simple ODS-to-DataFrame converter could look like this:

import pandas as pd
import ezodf

doc = ezodf.opendoc('some_odf_spreadsheet.ods')

print("Spreadsheet contains %d sheet(s)." % len(doc.sheets))
for sheet in doc.sheets:
    print("-"*40)
    print("   Sheet name : '%s'" % sheet.name)
    print("Size of Sheet : (rows=%d, cols=%d)" % (sheet.nrows(), sheet.ncols()) )

# convert the first sheet to a pandas.DataFrame
sheet = doc.sheets[0]
df_dict = {}
for i, row in enumerate(sheet.rows()):
    # row is a list of cells
    # assume the header is on the first row
    if i == 0:
        # columns as lists in a dictionary
        df_dict = {cell.value:[] for cell in row}
        # create index for the column headers
        col_index = {j:cell.value for j, cell in enumerate(row)}
        continue
    for j, cell in enumerate(row):
        # use header instead of column index
        df_dict[col_index[j]].append(cell.value)
# and convert to a DataFrame
df = pd.DataFrame(df_dict)

P.S.

ODF spreadsheet (*.ods files) support has been requested on the pandas issue tracker: https://github.com/pydata/pandas/issues/2311, but it is still not implemented.
ezodf was used in the unfinished PR9070 to implement ODF support in pandas. That PR is now closed (read the PR for a technical discussion), but it is still available as an experimental feature in this pandas fork.
there are also some brute force methods to read directly from the XML code (here)

0人赞添加讨论(0) 举报

Evening l夕情丶

4楼-- · 2020-02-17 01:03

It seems the answer is No! And I would characterize the tools to read in ODS still ragged. If you're on POSIX, maybe the strategy of exporting to xlsx on the fly before using Pandas' very nice importing tools for xlsx is an option:

unoconv -f xlsx -o tmp.xlsx myODSfile.ods

Altogether, my code looks like:

import pandas as pd
import os
if fileOlderThan('tmp.xlsx','myODSfile.ods'):
    os.system('unoconv -f xlsx -o tmp.xlsx myODSfile.ods ')
xl_file = pd.ExcelFile('tmp.xlsx')
dfs = {sheet_name: xl_file.parse(sheet_name) 
          for sheet_name in xl_file.sheet_names}
df=dfs['Sheet1']

Here fileOlderThan() is a function (see http://github.com/cpbl/cpblUtilities) which returns true if tmp.xlsx does not exist or is older than the .ods file.

0人赞添加讨论(0) 举报

孤傲高冷的网名

5楼-- · 2020-02-17 01:04

If possible, save as CSV from the spreadsheet application and then use pandas.read_csv(). IIRC, an 'ods' spreadsheet file actually is an XML file which also contains quite some formatting information. So, if it's about tabular data, extract this raw data first to an intermediate file (CSV, in this case), which you can then parse with other programs, such as Python/pandas.

0人赞添加讨论(0) 举报

我只想做你的唯一

6楼-- · 2020-02-17 01:06

There is support for reading Excel files in Pandas (both xls and xlsx), see the read_excel command. You can use OpenOffice to save the spreadsheet as xlsx. The conversion can also be done automatically on the command line, apparently, using the convert-to command line parameter.

Reading the data from xlsx avoids some of the issues (date formats, number formats, unicode) that you may run into when you convert to CSV first.

0人赞添加讨论(0) 举报

我命由我不由天

7楼-- · 2020-02-17 01:09

This is available natively in pandas 0.25. So long as you have odfpy installed you can do

pd.read_excel("the_document.ods", engine="odf")

0人赞添加讨论(0) 举报

1 2 下一页

How to convert OpenDocument spreadsheets to a pand

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间