The following is only the beginning for an Coursera assignment on Data Science. I hope this is not to trivial for. But I am lost on this and could not find an answer. I am asked to import an Excelfile into a panda dataframe and to manipulate it afterwards. The file can be found here: http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls
What makes it difficult for me is
a) there is an 'overhead' of 17 lines and a footer b) the first two columns are empty c) the index column has no header name
After hours if seraching and reading I came up with this useless line:
energy=pd.read_excel('Energy Indicators.xls',
sheetname='Energy',
header=16,
skiprows=[17],
skipfooter=38,
skipcolumns=2
)
This seems to produce a multindex dataframe. Though the command energy.head() returns nothing.
I have two questions:
- what did I wrong. Up to this exercise I thought I understand the dataframe. But now I am totally clueless and lost :-((
- How do I have to tackle this? What do I have to do to get this Exceldata into a datafrae with the index consisting of the countries?
Thanks.
I installed
xlrd
package, withpip install xlrd
and then loaded the file successfully as follows:Notice that I am using the last version of pandas
0.20.3
make sure you have the latest version on your system.I think you need add parameters:
index_col
for convert column to indexusecols
- parse columns by positions15
I modified your code and was able to get the data into the dataframe. Instead of
skipcolumns
(which did not work), I used the argumentusecols
as followsIn order to make the countries as the index, you can do the following