when running the following in a Python 3.5 Jupyter environment I get the error below. Any ideas on what is causing it?
import findspark
findspark.init()
error:
IndexError Traceback (most recent call
last) <ipython-input-20-2ad2c7679ebc> in <module>()
1 import findspark
----> 2 findspark.init()
3
4 import pyspark
/.../anaconda/envs/pyspark/lib/python3.5/site-packages/findspark.py in init(spark_home, python_path, edit_rc, edit_profile)
132 # add pyspark to sys.path
133 spark_python = os.path.join(spark_home, 'python')
--> 134 py4j = glob(os.path.join(spark_python, 'lib', 'py4j-*.zip'))[0]
135 sys.path[:0] = [spark_python, py4j]
136
IndexError: list index out of range
This is most likely due to the
SPARK_HOME
environment variable not being set correctly on your system. Alternatively, you can just specify it when you're initialisingfindspark
, like so:After that, it should all work!
I was getting the same error and was able to make it work by entering the exact installation directory:
You need to update the
SPARK_HOME
variable inside bash_profile. For me, the following command worked(in terminal):export SPARK_HOME="/usr/local/Cellar/apache-spark/2.2.0/libexec/"
After this, you can use follow these commands: