I installed Spark, ran the sbt assembly, and can open bin/pyspark with no problem. However, I am running into problems loading the pyspark module into ipython. I'm getting the following error:
In [1]: import pyspark
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-c15ae3402d12> in <module>()
----> 1 import pyspark
/usr/local/spark/python/pyspark/__init__.py in <module>()
61
62 from pyspark.conf import SparkConf
---> 63 from pyspark.context import SparkContext
64 from pyspark.sql import SQLContext
65 from pyspark.rdd import RDD
/usr/local/spark/python/pyspark/context.py in <module>()
28 from pyspark.conf import SparkConf
29 from pyspark.files import SparkFiles
---> 30 from pyspark.java_gateway import launch_gateway
31 from pyspark.serializers import PickleSerializer, BatchedSerializer, UTF8Deserializer, \
32 PairDeserializer, CompressedSerializer
/usr/local/spark/python/pyspark/java_gateway.py in <module>()
24 from subprocess import Popen, PIPE
25 from threading import Thread
---> 26 from py4j.java_gateway import java_import, JavaGateway, GatewayClient
27
28
ImportError: No module named py4j.java_gateway
Install pip module 'py4j'.
pip install py4j
I got this problem with Spark 2.1.1 and Python 2.7.x. Not sure if Spark stopped bundling this package in latest distributions. But installing
py4j
module solved the issue for me.In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:
That worked in ipython for me.
Update: as noted in the comments, the name of the py4j zip file changes with each Spark release, so look around for the right name.
I solved this problem by adding some paths in .bashrc
After this, it never raise ImportError: No module named py4j.java_gateway.
In Pycharm, before running above script, ensure that you have unzipped the py4j*.zip file. and add its reference in script sys.path.append("path to spark*/python/lib")
It worked for me.