I am trying to read merged cells of Excel with Python using xlrd.
My Excel: (note that the first column is merged across the three rows)
A B C
+---+---+----+
1 | 2 | 0 | 30 |
+ +---+----+
2 | | 1 | 20 |
+ +---+----+
3 | | 5 | 52 |
+---+---+----+
I would like to read the third line of the first column as equal to 2 in this example, but it returns ''
. Do you have any idea how to get to the value of the merged cell?
My code:
all_data = [[]]
excel = xlrd.open_workbook(excel_dir+ excel_file)
sheet_0 = excel.sheet_by_index(0) # Open the first tab
for row_index in range(sheet_0.nrows):
row= ""
for col_index in range(sheet_0.ncols):
value = sheet_0.cell(rowx=row_index,colx=col_index).value
row += "{0} ".format(value)
split_row = row.split()
all_data.append(split_row)
What I get:
'2', '0', '30'
'1', '20'
'5', '52'
What I would like to get:
'2', '0', '30'
'2', '1', '20'
'2', '5', '52'
This function you can get a array like
['A1:M1', 'B22:B27']
, which tell you the cells to be merged.This function shows you whether a cell has been merged or not
Using XLRDs merged cells
I just tried this and it seems to work for your sample data:
returning
It keeps track of the values from the previous row and uses them if the corresponding value from the current row is empty.
Note that the above code does not check if a given cell is actually part of a merged set of cells, so it could possibly duplicate previous values in cases where the cell should really be empty. Still, it might be of some help.
Additional information:
I subsequently found a documentation page that talks about a
merged_cells
attribute that one can use to determine the cells that are included in various ranges of merged cells. The documentation says that it is "New in version 0.6.1", but when i tried to use it with xlrd-0.9.3 as installed bypip
I got the errorI'm not particularly inclined to start chasing down different versions of xlrd to test the
merged_cells
feature, but perhaps you might be interested in doing so if the above code is insufficient for your needs and you encounter the same error that I did withformatting_info=True
.You can also try using fillna method available in pandas https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
This should replace the cell's value with the previous value
I was trying the previous solutions without having existo, nevertheless the following worked for me:
I hope it serves someone in the future
For those who are looking for handling merged cell, the way OP has asked, while not overwriting non merged empty cells.
Based on OP's code and additional information given by @gordthompson's answers and @stavinsky's comment, The following code will work for excel files (xls, xlsx), it will read excel file's first sheet as a dataframe. For each merged cell, it will replicate that merged cell content over all the cells this merged cell represent, as asked by original poster.Note that merged_cell feature of xlrd for 'xls' file will only work if 'formatting_info' parameter is passed while opening workbook.