I wanted to install pyspark
on my home machine. I did
pip install pyspark
pip install jupyter
Both seemed to work well.
But when I try to run pyspark
I get
pyspark
Could not find valid SPARK_HOME while searching ['/home/user', '/home/user/.local/bin']
What should SPARK_HOME
be set to?
Pyspark from PyPi (i.e. installed with
pip
) does not contain the full Pyspark functionality; it is only intended for use with a Spark installation in an already existing cluster [EDIT: or in local mode only - see accepted answer]. From the docs:You should download a full Spark distribution as described here.
I just faced the same issue, but it turned out that
pip install pyspark
downloads spark distirbution that works well in local mode. Pip just doesn't set appropriateSPARK_HOME
. But when I set this manually, pyspark works like a charm (without downloading any additional packages).Hope that helps :-)