Openpyxl max_row and max_column wrongly reports a

2020-02-13 04:02发布

My query is to do with a function that is part of a parsing script Im developing. I am trying to write a python function to find the column number corresponding to a matched value in excel. The excel has been created on the fly with openpyxl, and it has the first row (from 3rd column) headers that each span 4 columns merged into one. In my subsequent function, I am parsing some content to be added to the columns corresponding to the matching headers. (Additional info: The content I'm parsing is blast+ output. I'm trying to create a summary spreadsheet with the hit names in each column with subcolumns for hits, gaps, span and identity. The first two columns are query contigs and its length. )

I had initially written a similar function for xlrd and it worked. But when I try to rewrite it for openpyxl, I find that the max_row and max_col function wrongly returns a larger number of rows and columns than actually present. For instance, I have 20 rows for this pilot input, but it reports it as 82. Note that I manually selected the empty rows & columns and right clicked and deleted them, as advised elsewhere in this forum. This didn't change the error.

def find_column_number(x):
    col = 0
    print "maxrow = ", hrsh.max_row
    print "maxcol = ", hrsh.max_column
    for rowz in range(hrsh.max_row):
        print "now the row is ", rowz
        if(rowz > 0): 
            pass
        for colz in range(hrsh.max_column):
            print "now the column is ", colz
            name = (hrsh.cell(row=rowz,column=colz).value)
            if(name == x):
                col = colz
    return col 

The issue with max_row and max_col, has been discussed here https://bitbucket.org/openpyxl/openpyxl/issues/514/cell-max_row-reports-higher-than-actual I applied the suggestion here. But the max_row is still wrong.

for row in reversed(hrsh.rows):
    values = [cell.value for cell in row]
    if any(values):
        print("last row with data is {0}".format(row[0].row))
        maxrow = row[0].row

I then tried the suggestion at https://www.reddit.com/r/learnpython/comments/3prmun/openpyxl_loop_through_and_find_value_of_the/, and tried to get the column values. Once, again the script takes into account the empty columns and reports a higher number columns than actually present.

for currentRow in hrsh.rows:
    for currentCell in currentRow:
        print(currentCell.value)

Can you please help me resolve this error, or suggest another method to achieve my aim?

2条回答
啃猪蹄的小仙女
2楼-- · 2020-02-13 04:24

As noted in the bug report you linked to there's a difference between a sheet's reported dimensions and whether these include empty rows or columns. If max_row and max_column are not reporting what you want to see then you will need to write your own code to find the first completely empty. The most efficient way, of course, would be to start from max_row and work backwards but the following is probably sufficient:

for max_row, row in enumerate(ws, 1):
    if all(c.value is None for c in row):
        break
查看更多
混吃等死
3楼-- · 2020-02-13 04:29

I confirm the bug found by the OP. I found newer posts reporting max_row being too large. This bug cannot be fixed.

In my case, it appears when I set the value of all cells in a worksheet to None. After this operation, the worksheet still reports the old dimensions.

A call to ws.calculate_dimensions() does not change anything. Closing and restarting excel still has openpyxl report the same wrong dimensions.

This is a problem because ws.append() starts at ws.max_row, and there is no way to override this behaviour. You end up with a worksheet that is blank and then, somewhere down, the data you appended appears.

The only way I found out that remedies this bug is to delete entire rows by hand in excel. openpyxl then shows the correct max_row.

I found out that this is linked to the member ws._cells not being empty as it should after setting all cells to None. However, the user cannot delete this dictionary as it is a private member.

查看更多
登录 后发表回答