Can't read excel files, using openpyxl

2019-07-28 06:00发布

问题:

I have a list of excel files with similar last row. It contains private information about client (his name, surname, phone). Each excel file corresponds to a client. I need to make one excel file with all data about every client. I decide to do it automatically, so looked to openpyxl library. I wrote the following code, but it doesn't work correctly.

import openpyxl
import os
import glob
from openpyxl import load_workbook
from openpyxl import Workbook
import openpyxl.styles
from openpyxl.cell import get_column_letter

path_kit = 'prize_input/kit'

#creating single document
prize_info = Workbook()
prize_sheet = prize_info.active

file_array_reciever = []

for file in glob.glob(os.path.join(path_kit, '*.xlsx')):
    file_array_reciever.append(file)

row_num = 1
for f in file_array_reciever:
    f1 = load_workbook(filename=f)
    sheet = f1.active
    for col_num in range (3, sheet.max_column):
        prize_sheet.cell(row=row_num, column=col_num).value = \
            sheet.cell(row=sheet.max_row, column=col_num).value

    prize_info.save("Ex.xlsx")

I get this error:

Traceback (most recent call last):
  File "/Users/zkid18/PycharmProjects/untitled/excel_test.py", line 43, in <module>
    f1 = load_workbook(filename=f)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/openpyxl/reader/excel.py", line 183, in load_workbook
    wb.active = read_workbook_settings(archive.read(ARC_WORKBOOK)) or 0
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1229, in read
    with self.open(name, "r", pwd) as fp:
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1252, in open
    zinfo = self.getinfo(name)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/zipfile.py", line 1196, in getinfo
    'There is no item named %r in the archive' % name)
KeyError: "There is no item named 'xl/workbook.xml' in the archive"

Looks like it is a problem with reading file.
I don't understand where it gets an item named 'xl/workbook.xml' in the archive.

回答1:

Depending on which version you are using, this could be a bug in openpyxl. For example, in 1.6.1 a bug was introduced exhibiting this behavior. Reverting to 1.5.8 fixed it. There was a fix according to this openpyxl ticket; though the ticket doesn't say when the fix was delivered, it was committed in early 2013. I upgraded to 1.6.2 and the error went away.



回答2:

You can use xlrd biblioteque

This script allow you to transform a excel data to list of dictionnaries

import xlrd

workbook = xlrd.open_workbook('your_file.xlsx')
workbook = xlrd.open_workbook('your_file.xlsx', on_demand = True)
worksheet = workbook.sheet_by_index(0)
first_row = [] # The row where we stock the name of the column
for col in range(worksheet.ncols):
    first_row.append( worksheet.cell_value(0,col) )
# tronsform the workbook to a list of dictionnary
data =[]
for row in range(1, worksheet.nrows):
    elm = {}
    for col in range(worksheet.ncols):
        elm[first_row[col]]=worksheet.cell_value(row,col)
    data.append(elm)
print data


回答3:

I guess your file is .xls format before, you can use

try:
    f1 = load_workbook(filename=f)
except:
    print f

to find which file cause this error and reopen it in Excel, then save as .xlsx.



回答4:

I found this post searching for a solution to a similar issue, ("There is no item named '[Content_Types].xml' in the archive")

None of this error message makes any sense in terms of my script or the file. My script adds 1 sheet and updates five more in an existing Excel document. While my script was running, I realized I had an error in my code. I canceled my script mid-running.

After canceling, the existing Excel file exhibited this error. Working out bugs with the script, maybe you corrupted your Excel file??

To address this, I'm thinking of creating a temporary restore file in the event of an error using OpenPyXl.



回答5:

I has the same issue, make sure the file you're trying to read isn't open in Excel already



回答6:

If openpyxl still doesn't work, using pandas works.

$ pip install pandas xlrd

And this code works:

import pandas as pd

df = pd.read_excel(file_path)


回答7:

Option 1: I have overcome this issue by adding read_only=True: Specifically, replace

f1 = load_workbook(filename=f) with

f1 = load_workbook(filename=f, read_only=True)

Note: Depending on your code,read_only=True can make your code very slow. If this is the case for you, you may want to try option 2.

Option 2: Open your problematic workbook in excel, and then re-save it as a Strict Open XML Spreadsheet (*.xlsx)