I installed Dask using pip like this:
pip install dask
and when I try to do import dask.dataframe as dd
I get the following error message:
>>> import dask.dataframe as dd
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/path/to/venv/lib/python2.7/site-packages/dask/__init__.py", line 5, in <module>
from .async import get_sync as get
File "/path/to/venv/lib/python2.7/site-packages/dask/async.py", line 120, in <module>
from toolz import identity
ImportError: No module named toolz
No module named toolz
I noticed that the documentation states
pip install dask
: Install only dask, which depends only on the standard library. This is appropriate if you only want the task schedulers.
so I'm confused as to why this didn't work.
I had this same issue and this was what fixed it for me.
pip install "dask[complete]"
: This will install everything. You may wish to install only a given component like dataframe, then usepip install "dask[dataframe]"
The bottomline was that I had to be in my virtual environment; this would install dask for this env only.
At Dask 0.13.0 and below, there was a requirement on toolz'
identity
function withindask/async.py
. There isan opena closed pull request associated with GitHub issue #1849 to remove this dependency.In the meantimeIf, for some reason, you are stuck with an older version of dask, you can work around that particular issue by simply doingpip install toolz
.But this wouldn't (completely) fix your problem with
import dask.dataframe as dd
anyway. Because you'd still get this error:or if you had pandas installed already, you'd get
ImportError: No module named cloudpickle
. So.In order to use Dask's parallelized dataframes (built on top of pandas), you have to tell pip to install some "extras" (reference), as mentioned in the Dask installation documentation:
Or you could just do
to get the whole bag of tricks. NB: The double-quotes may or may not be required in your shell.
The justification for this is also mentioned in the Dask documentation:
requeriments.txt working: