I am using XLRD to attempt to read from and manipulate string text encapsulated within the cells of my excel document. I am posting my code, as well as the text that is returned when I choose to print a certain column.
import xlrd
data = xlrd.open_workbook('data.xls')
sheetname = data.sheet_names()
employees = data.sheet_by_index(0)
print employees.col(2)
>>>[text:u'employee_first', text:u'\u201cRichard\u201d', text:u'\u201cCatesby\u201d', text:u'\u201cBrian\u201d']
My intention is to create a dict or either reference the excel documents using strings in python. I would like to have a number of my functions in my program manipulate the data locally and then output at a later point (not within the scope of this question) to a second excel file.
How do I get rid of this extra information?
If you are only interested in the values of the cells, then you should do:
values = sheet.col_values(colx=2)
instead of:
cells = sheet.col(colx=2)
values = [c.value for c in cells]
because it's more concise and more efficient (Cell
objects are constructed on the fly as/when requested).
employees.col(2)
is a list of xlrd.sheet.Cell
instances. To get all the values from the column (instead of the Cell
objects), you can use the col_values
method:
values = employees.col_values(2)
You could also do this (my original suggestion):
values = [c.value for c in employees.col(2)]
but that is much less efficient than using col_values
.
\u201c
and \u201d
are unicode left and right double quotes, respectively. If you want to get rid of those, you can use, say, the lstrip and rstrip string methods. E.g. something like this:
values = [c.value.lstrip(u'\u201c').rstrip(u'\u201d') for c in employees.col(2)]