Pass proxy to Seaborn

2020-04-14 08:40发布

问题:

I am using seaborn for data visualization. But it fails over the sample data it has in documentation

import seaborn as sns
sns.set()
tips = sns.load_dataset("tips")

Traceback (most recent call last):
  File "databaseConnection.py", line 35, in <module>
    tips = sns.load_dataset("tips")
  File "C:\python3.7\lib\site-packages\seaborn\utils.py", line 428, in load_dataset
    urlretrieve(full_path, cache_path)
  File "C:\python3.7\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\python3.7\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\python3.7\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "C:\python3.7\lib\urllib\request.py", line 543, in _open
    '_open', req)
  File "C:\python3.7\lib\urllib\request.py", line 503, in _call_chain
    result = func(*args)
  File "C:\python3.7\lib\urllib\request.py", line 1360, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "C:\python3.7\lib\urllib\request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>

That's because I am behind a proxy, but how can I ask seaborn to use proxy?

回答1:

You can download the file manually.

Use

import seaborn as sns
print(sns.utils.get_data_home())

to find out the folder for your seaborn data, e.g. it might come out as C:\Users\username\seaborn-data on windows.

Download the file https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv to that folder. Finally, use the "cached" option

sns.load_dataset("tips", cache=True)

Alternatively download the file to any other folder. Use that folder's pathname as data_home argument

sns.load_dataset(name, cache=True, data_home="path/to/folder")


回答2:

I understand, question is bit old. But I was looking for the similar kind of solution, that I didn't get working for me (somehow) mentioned above. So, I created similar/duplicate question at below link:

Not able to resolve issue(HTTP error 404) with seaborn.load_dataset function

And then I found my solution through debugging. Details is below:

load_dataset() is available in 'utils.py' library file, where path has below hard coded string:

path = ("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/{}.csv")

So, whatever file name we provide in load_dataset() function, python searches it online to above path. There is no option, where we can give our own online link for the dataset other than the above path. The second parameter of load_dataset() is 'cache' that has default boolean value as 'True'. So, if dataset not found online then the function will look into the physical path as below:

<Your Drive>:\Users\<Your User name>\seaborn-data 
    e.g. C:\Users\user1\seaborn-data    

This path should have our dataset if not found online. i.e. below code will work if we have dataset physically present:

df = sns.load_dataset('FiveYearData')

(Note: But if dataset is found online then, due to cache=True, it will be copied to above path as well.)

We can also provide different physical path for dataset through third parameter (data_home) as below:

df = sns.load_dataset('FiveYearData',data_home=os.path.dirname(os.path.abspath("FiveYearData")))

Here, I am taking my current project working directory to have my dataset.