I'm trying to extract Zip Codes from an Excel Spreadsheet and load them into a list as Strings.
import xlrd
BIL = xlrd.open_workbook(r"C:\Temp\Stores.xls)
Worksheet = BIL.sheet_by_name("Open_Locations")
ZIPs = []
for record in Worksheet.col(17):
if record.value == "Zip":
pass
else:
ZIPs.append(record.value)
Unfortunately, this Excel workbook is managed by someone else so I cannot simply go and convert the field holding zip codes in the excel spreadsheet to text to solve my problem. In addition, believe it or not, this Excel spreadsheet also is used by some business intelligence systems. So changing that field from number to String could cause problems for other workflows leveraging this workbook, which I am not privy to.
What I'm finding is that when I print the numbers as they are without casting to integer or string first, I of course get a bunch of floats. I expected that, since Excel stores numbers as floats.
>>>Zips
[u'06405',
04650.0,
10017.0,
71055.0,
70801.0]
What I didn't expect is that when I cast these floats as int to get rid of the decimal values, then cast the result of that as string the result is that any leading or trailing zero which are part of the zip code value are truncated.
import xlrd
BIL = xlrd.open_workbook(r"C:\Temp\Stores.xls)
Worksheet = BIL.sheet_by_name("Open_Locations")
ZIPs = []
for record in Worksheet.col(17):
if record.value == "Zip":
pass
else:
ZIPs.append(str(int(record.value)))
>>>Zips
['6405',
'465',
'10017',
'71055',
'70801']
How can I convert these zip codes to string without dropping the leading or trailing zeros or determine the number of leading and trailing zeros on the value prior to truncation and append them back as appropriate?
All ZIP codes (not including the Zip+4) are 5 characters so you could just pad out to 5:
C#
ZIPs.append(str.PadLeft(5, '0');
Python:
ZIPs.append(str(int(record.value)).rjust(5, '0'))
So after some tinkering, it turned out the answer was to:
The presence of the unicode string indicator tipped me off that this might be the answer when it appeared on some values but not all when I printed the list
If anyone has a more elegant way of doing this, I would love to see it.
You can try to do this through string manipulation.
Our assumption here will be that the column will be ZIP codes, so '.0' at the end will never be necessary.
The below would go in your else statement:
Or if you want to be risque our assumption here would be reading this column will always have a '.0' otherwise it can cause unexpected behavior.