I am using Python 3.4 with IPython and have the following code. I'm unable to read a csv-file from the given URL:
import pandas as pd
import requests
url="https://github.com/cs109/2014_data/blob/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(s)
I have the following error
"Expected file path name or file-like object, got type"
How can I fix this?
The problem you're having is that the output you get into the variable 's' is not a csv, but a html file. In order to get the raw csv, you have to modify the url to:
'https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv'
Your second problem is that read_csv expects a file name, we can solve this by using StringIO from io module. Third problem is that request.get(url).content delivers a byte stream, we can solve this using the request.get(url).text instead.
End result is this code:
output:
As I commented you need to use a StringIO object and decode i.e
c=pd.read_csv(io.StringIO(s.decode("utf-8")))
if using requests, you need to decode as .content returns bytes if you used .text you would just need to pass s as iss = requests.get(url).text
c =pd.read_csv(StringIO(s))
.A simpler approach is to pass the correct url of the raw data directly to
read_csv
, you don't have to pass a file like object, you can pass a url so you don't need requests at all:Output:
From the docs:
filepath_or_buffer :
Just as the error suggests ,
pandas.read_csv
needs a file-like object as the first argument.If you want to read the csv from a string, you can use
io.StringIO
(Python 3.x) orStringIO.StringIO
(Python 2.x) .Also, for the URL - https://github.com/cs109/2014_data/blob/master/countries.csv - you are getting back
html
response , not raw csv, you should use the url given by theRaw
link in the github page for getting raw csv response , which is - https://raw.githubusercontent.com/cs109/2014_data/master/countries.csvExample -
Update
From pandas
0.19.2
you can now just pass the url directly.In the latest version of pandas (
0.19.2
) you can directly pass the url