This is a copy of someone else's question on another forum that was never answered, so I thought I'd re-ask it here, as I have the same issue. (See http://geekple.com/blogs/feeds/Xgzu7/posts/351703064084736)
I have Spark installed properly on my machine and am able to run python programs with the pyspark modules without error when using ./bin/pyspark as my python interpreter.
However, when I attempt to run the regular Python shell, when I try to import pyspark modules I get this error:
from pyspark import SparkContext
and it says
"No module named pyspark".
How can I fix this? Is there an environment variable I need to set to point Python to the pyspark headers/libraries/etc.? If my spark installation is /spark/, which pyspark paths do I need to include? Or can pyspark programs only be run from the pyspark interpreter?
dont run your py file as:
python filename.py
instead use:spark-submit filename.py
You can also create a Docker container with Alpine as the OS and the install Python and Pyspark as packages. That will have it all containerised.
I had the same problem.
Also make sure you are using right python version and you are installing it with right pip version. in my case: I had both python 2.7 and 3.x. I have installed pyspark with
pip2.7 install pyspark
and it worked.
I had this same problem and would add one thing to the proposed solutions above. When using Homebrew on Mac OS X to install Spark you will need to correct the py4j path address to include libexec in the path (remembering to change py4j version to the one you have);
In the case of DSE (DataStax Cassandra & Spark) The following location needs to be added to PYTHONPATH
Then use the dse pyspark to get the modules in path.