Openpyxl max_row and max_column wrongly reports a

My query is to do with a function that is part of a parsing script Im developing. I am trying to write a python function to find the column number corresponding to a matched value in excel. The excel has been created on the fly with openpyxl, and it has the first row (from 3rd column) headers that each span 4 columns merged into one. In my subsequent function, I am parsing some content to be added to the columns corresponding to the matching headers. (Additional info: The content I'm parsing is blast+ output. I'm trying to create a summary spreadsheet with the hit names in each column with subcolumns for hits, gaps, span and identity. The first two columns are query contigs and its length. )

I had initially written a similar function for xlrd and it worked. But when I try to rewrite it for openpyxl, I find that the max_row and max_col function wrongly returns a larger number of rows and columns than actually present. For instance, I have 20 rows for this pilot input, but it reports it as 82. Note that I manually selected the empty rows & columns and right clicked and deleted them, as advised elsewhere in this forum. This didn't change the error.

def find_column_number(x):
    col = 0
    print "maxrow = ", hrsh.max_row
    print "maxcol = ", hrsh.max_column
    for rowz in range(hrsh.max_row):
        print "now the row is ", rowz
        if(rowz > 0): 
            pass
        for colz in range(hrsh.max_column):
            print "now the column is ", colz
            name = (hrsh.cell(row=rowz,column=colz).value)
            if(name == x):
                col = colz
    return col

The issue with max_row and max_col, has been discussed here https://bitbucket.org/openpyxl/openpyxl/issues/514/cell-max_row-reports-higher-than-actual I applied the suggestion here. But the max_row is still wrong.

for row in reversed(hrsh.rows):
    values = [cell.value for cell in row]
    if any(values):
        print("last row with data is {0}".format(row[0].row))
        maxrow = row[0].row

I then tried the suggestion at https://www.reddit.com/r/learnpython/comments/3prmun/openpyxl_loop_through_and_find_value_of_the/, and tried to get the column values. Once, again the script takes into account the empty columns and reports a higher number columns than actually present.

for currentRow in hrsh.rows:
    for currentCell in currentRow:
        print(currentCell.value)

Can you please help me resolve this error, or suggest another method to achieve my aim?

标签： python excel openpyxl

2条回答

啃猪蹄的小仙女

2楼-- · 2020-02-13 04:24

As noted in the bug report you linked to there's a difference between a sheet's reported dimensions and whether these include empty rows or columns. If max_row and max_column are not reporting what you want to see then you will need to write your own code to find the first completely empty. The most efficient way, of course, would be to start from max_row and work backwards but the following is probably sufficient:

for max_row, row in enumerate(ws, 1):
    if all(c.value is None for c in row):
        break

0人赞添加讨论(0) 举报

混吃等死

3楼-- · 2020-02-13 04:29

I confirm the bug found by the OP. I found newer posts reporting max_row being too large. This bug cannot be fixed.

In my case, it appears when I set the value of all cells in a worksheet to None. After this operation, the worksheet still reports the old dimensions.

A call to ws.calculate_dimensions() does not change anything. Closing and restarting excel still has openpyxl report the same wrong dimensions.

This is a problem because ws.append() starts at ws.max_row, and there is no way to override this behaviour. You end up with a worksheet that is blank and then, somewhere down, the data you appended appears.

The only way I found out that remedies this bug is to delete entire rows by hand in excel. openpyxl then shows the correct max_row.

I found out that this is linked to the member ws._cells not being empty as it should after setting all cells to None. However, the user cannot delete this dictionary as it is a private member.

0人赞添加讨论(0) 举报

Openpyxl max_row and max_column wrongly reports a

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间