I am attempting to read values from an Excel file using xlrd. It has been working great on dates, numbers, and up until now text. I have a column (category) with cells containing text (the cells are formatted as text). When I print out the cell value a float is displayed instead of the text. I also printed out the ctype of the Cell object(s) to check and it is showing as Number. I've read through the documentation and tutorial of xlrd and can't seem to find why this is occurring. Could it be that my excel file is somehow messed up? Any suggestions or pointers in the right direction?
import xlrd
import datetime
workbook = xlrd.open_workbook('training.xls')
courseSheet = workbook.sheet_by_index(0)
for row in range(courseSheet.nrows):
title = courseSheet.cell_value(row, 2)
date = courseSheet.cell_value(row, 4)
date = datetime.datetime(*xlrd.xldate_as_tuple(date, workbook.datemode))
dateTuple = date.timetuple()
category = courseSheet.cell_value(row, 7)
print category
Background: For each cell, xlrd reports the intrinsic value (if any) that is stored in the XLS file. The value types are assigned initially solely on the basis of the record type in the file (e.g. NUMBER and RK records contain floating-point numbers). It classifies formats as described here and uses that information to override the value type where it is apparent that a datetime, date, or time is intended rather that a number. xlrd does not purport to be able to render cell values according to the format ascribed to the cell.
The cells in question have evidently been entered as numbers. If they have had a text format applied to them, that does not make them "text cells".
You say """When I print out the cell value a float is displayed instead of the text""" ... please give some examples of (a) what was typed into the cell when the file was created (b) what is the evidence that "the cells are formatted as text" (c) what is repr(cell.value) (d) what is "the text" that you expected to be displayed?
You may find the following code useful:
import xlrd, sys
def dump_cell(sheet, rowx, colx):
c = sheet.cell(rowx, colx)
xf = sheet.book.xf_list[c.xf_index]
fmt_obj = sheet.book.format_map[xf.format_key]
print rowx, colx, repr(c.value), c.ctype, \
fmt_obj.type, fmt_obj.format_key, fmt_obj.format_str
book = xlrd.open_workbook(sys.argv[1], formatting_info=1)
sheet = book.sheet_by_index(0)
for rowx in xrange(sheet.nrows):
for colx in xrange(sheet.ncols):
dump_cell(sheet, rowx, colx)
I have the same problem as OP and I think I've come to the conclusion that there are cases where there is no solution on the python (xlrd) side. You are at the mercy of how the data was originally entered into the excel sheet. Specifically, if the data was entered into a cell that already had the correct 'Text' format applied to it, or if the data was entered into a cell with the default 'General' format and then the format of the cell was changed to 'Text' after the data was entered.
If you enter data into a pre-formatted cell, your numeric data will be flagged with an excel warning tick indicating that you have numeric data in a cell formatted for Text. In this case, xlrd will handle the data as you are expecting - returning the string as it appears in the excel worksheet. (e.g. the cell contents read as "1" in excel and xlrd will return "1" as the cell value)
However, if you change the format of the cell after the numeric data has been entered then you will end up in a situation where the data in excel is presented as "1", but xlrd will return a cell value of "1.0". If you check the xlrd cell.ctype for this cell you will see that the cell is still being treated as a number even though the format was changed to Text in Excel.
A possible solution may be to have your excel string data surrounded by quotes. This would prohibit excel from treating the data as a numeric value from the start.