I am new to python and even newer to pandas, but relatively well versed in R. I am using Anaconda, with Python 3.5 and pandas 0.18.1. I am trying to read in an excel file as a dataframe. The file admittedly is pretty... ugly. There is a lot of empty space, missing headers, etc. (I am not sure if this is the source of any issues)
I create the file object, then find the appropriate sheet, then try to read that sheet as a dataframe:
xl = pd.ExcelFile(allFiles[i])
sName = [s for s in xl.sheet_names if 'security exposure' in s.lower()]
df = xl.parse(sName)
df
Results:
{'Security exposure - 21 day lag': Percent of Total Holdings \
0 KMNFC vs. 3 Month LIBOR AUD
1 04-OCT-16
2 Australian Dollar
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 Long/Short Net Exposure
9 Total
10 NaN
11 Long
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
(This goes on for 20-30 more rows and 5-6 more columns)
I am using Anaconda, and Spyder, which has a 'Variable Explorer'. It shows the variable df to be a dict of the DataFrame type:
However, I cannot use iloc:
df.iloc[:,1]
Traceback (most recent call last):
File "<ipython-input-77-d7b3e16ccc56>", line 1, in <module>
df.iloc[:,1]
AttributeError: 'dict' object has no attribute 'iloc'
Any thoughts? What am I missing?
EDIT:
To be clear, what I am really trying to do is reference the first column of the df. In R this would be df[,1]. Looking around it seems to be not a very popular way to do things, or not the 'correct' way. I understand why indexing by column names, or keys, is better, but in this situation, I really just need to index the dataframes by column numbers. Any working method of doing that would be greatly appreciated.
EDIT (2):
Per a suggestion, I tried 'read_excel', with the same results:
df = pd.ExcelFile(allFiles[i]).parse(sName)
df.loc[1]
Traceback (most recent call last):
File "<ipython-input-90-fc40aa59bd20>", line 2, in <module>
df.loc[1]
AttributeError: 'dict' object has no attribute 'loc'
df = pd.read_excel(allFiles[i], sheetname = sName)
df.loc[1]
Traceback (most recent call last):
File "<ipython-input-91-72b8405c6c42>", line 2, in <module>
df.loc[1]
AttributeError: 'dict' object has no attribute 'loc'
The problem was here:
which returned a single element list. I changed it to the following:
which returns a string, and the code then performs as expected.
All thanks to ayhan for pointing this out.