I am using seaborn
for data visualization. But it fails over the sample data it has in documentation
import seaborn as sns
sns.set()
tips = sns.load_dataset("tips")
Traceback (most recent call last):
File "databaseConnection.py", line 35, in <module>
tips = sns.load_dataset("tips")
File "C:\python3.7\lib\site-packages\seaborn\utils.py", line 428, in load_dataset
urlretrieve(full_path, cache_path)
File "C:\python3.7\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\python3.7\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\python3.7\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\python3.7\lib\urllib\request.py", line 543, in _open
'_open', req)
File "C:\python3.7\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\python3.7\lib\urllib\request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\python3.7\lib\urllib\request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
That's because I am behind a proxy, but how can I ask seaborn
to use proxy?
You can download the file manually.
Use
import seaborn as sns
print(sns.utils.get_data_home())
to find out the folder for your seaborn data, e.g. it might come out as C:\Users\username\seaborn-data
on windows.
Download the file https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv
to that folder. Finally, use the "cached" option
sns.load_dataset("tips", cache=True)
Alternatively download the file to any other folder. Use that folder's pathname as data_home
argument
sns.load_dataset(name, cache=True, data_home="path/to/folder")
I understand, question is bit old. But I was looking for the similar kind of solution, that I didn't get working for me (somehow) mentioned above. So, I created similar/duplicate question at below link:
Not able to resolve issue(HTTP error 404) with seaborn.load_dataset function
And then I found my solution through debugging. Details is below:
load_dataset() is available in 'utils.py' library file, where path has below hard coded string:
path = ("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/{}.csv")
So, whatever file name we provide in load_dataset() function, python searches it online to above path. There is no option, where we can give our own online link for the dataset other than the above path.
The second parameter of load_dataset() is 'cache' that has default boolean value as 'True'. So, if dataset not found online then the function will look into the physical path as below:
<Your Drive>:\Users\<Your User name>\seaborn-data
e.g. C:\Users\user1\seaborn-data
This path should have our dataset if not found online. i.e. below code will work if we have dataset physically present:
df = sns.load_dataset('FiveYearData')
(Note: But if dataset is found online then, due to cache=True, it will be copied to above path as well.)
We can also provide different physical path for dataset through third parameter (data_home) as below:
df = sns.load_dataset('FiveYearData',data_home=os.path.dirname(os.path.abspath("FiveYearData")))
Here, I am taking my current project working directory to have my dataset.