pyspark error does not exist in the jvm error when

2019-02-22 12:03发布

I am using spark over emr and writing a pyspark script, I am getting an error when trying to

from pyspark import SparkContext
sc = SparkContext()

this is the error

File "pyex.py", line 5, in <module>
    sc = SparkContext()   File "/usr/local/lib/python3.4/site-packages/pyspark/context.py", line 118, in __init__
    conf, jsc, profiler_cls)   File "/usr/local/lib/python3.4/site-packages/pyspark/context.py", line 195, in _do_init
    self._encryption_enabled = self._jvm.PythonUtils.getEncryptionEnabled(self._jsc)   File "/usr/local/lib/python3.4/site-packages/py4j/java_gateway.py", line 1487, in __getattr__
    "{0}.{1} does not exist in the JVM".format(self._fqn, name)) py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

I found this answer stating that I need to import sparkcontext but this is not working also.

5条回答
2楼-- · 2019-02-22 12:21

I just had a fresh pyspark installation on my Windows device and was having the exact same issue. What seems to have helped is the following:

Go to your System Environment Variables and add PYTHONPATH to it with the following value: %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%, just check what py4j version you have in your spark/python/lib folder.

The reason why I think this works is because when I installed pyspark using conda, it also downloaded a py4j version which may not be compatible with the specific version of spark, so it seems to package its own version.

查看更多
Ridiculous、
3楼-- · 2019-02-22 12:23

The following steps solved my issue: - Downgrading it to 2.3.2 - adding PYTHONPATH as System Environment Variable with value %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH% Note: use proper version in the value given above, don't copy exactly.

查看更多
Anthone
4楼-- · 2019-02-22 12:27

PySpark recently released 2.4.0, but there's no stable release for spark coinciding with this new version. Try downgrading to pyspark 2.3.2, this fixed it for me

Edit: to be more clear your PySpark version needs to be the same as the Apache Spark version that is downloaded, or you may run into compatibility issues

Check the version of pyspark by using

pip freeze

查看更多
男人必须洒脱
5楼-- · 2019-02-22 12:29

Instead of editing the Environment Variables, you might just ensure that the Python environment (the one with pyspark) also has the same py4j version as the zip file present in the \python\lib\ dictionary within you Spark folder. E.g., d:\Programs\Spark\python\lib\py4j-0.10.7-src.zip on my system, for Spark 2.3.2. It's the py4j version shipped as part of the Spark archive file.

查看更多
唯我独甜
6楼-- · 2019-02-22 12:36

Use SparkContext().stop() at the end of the program to stop this situation.

查看更多
登录 后发表回答