importing pyspark in python shell

2019-01-03 12:36发布

This is a copy of someone else's question on another forum that was never answered, so I thought I'd re-ask it here, as I have the same issue. (See http://geekple.com/blogs/feeds/Xgzu7/posts/351703064084736)

I have Spark installed properly on my machine and am able to run python programs with the pyspark modules without error when using ./bin/pyspark as my python interpreter.

However, when I attempt to run the regular Python shell, when I try to import pyspark modules I get this error:

from pyspark import SparkContext

and it says

"No module named pyspark".

How can I fix this? Is there an environment variable I need to set to point Python to the pyspark headers/libraries/etc.? If my spark installation is /spark/, which pyspark paths do I need to include? Or can pyspark programs only be run from the pyspark interpreter?

17条回答
时光不老,我们不散
2楼-- · 2019-01-03 13:05

dont run your py file as: python filename.py instead use: spark-submit filename.py

查看更多
再贱就再见
3楼-- · 2019-01-03 13:06

You can also create a Docker container with Alpine as the OS and the install Python and Pyspark as packages. That will have it all containerised.

查看更多
Fickle 薄情
4楼-- · 2019-01-03 13:07

I had the same problem.

Also make sure you are using right python version and you are installing it with right pip version. in my case: I had both python 2.7 and 3.x. I have installed pyspark with

pip2.7 install pyspark

and it worked.

查看更多
The star\"
5楼-- · 2019-01-03 13:11

I had this same problem and would add one thing to the proposed solutions above. When using Homebrew on Mac OS X to install Spark you will need to correct the py4j path address to include libexec in the path (remembering to change py4j version to the one you have);

PYTHONPATH=$SPARK_HOME/libexec/python/lib/py4j-0.9-src.zip:$PYTHONPATH
查看更多
【Aperson】
6楼-- · 2019-01-03 13:11

In the case of DSE (DataStax Cassandra & Spark) The following location needs to be added to PYTHONPATH

export PYTHONPATH=/usr/share/dse/resources/spark/python:$PYTHONPATH

Then use the dse pyspark to get the modules in path.

dse pyspark
查看更多
登录 后发表回答