Pyspark: Exception: Java gateway process exited be-第4页回答

I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

Exception: Java gateway process exited before sending the driver its port number

when sc = SparkContext() is being called upon startup. I have tried running the following commands:

./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

with no avail. I have also looked here:

Spark + Python - Java gateway process exited before sending the driver its port number?

but the question has never been answered. Please help! Thanks.

标签： java python macos apache-spark pyspark

25条回答

ら.Afraid

2楼-- · 2020-01-24 23:35

There are so many reasons for this error. My reason is : the version of pyspark is incompatible with spark. pyspark version :2.4.0, but spark version is 2.2.0. it always cause python always fail when starting spark process. then spark cannot tell its ports to python. so error will be "Pyspark: Exception: Java gateway process exited before sending the driver its port number ".

I suggest you dive into source code to find out the real reasons when this error happens

0人赞添加讨论(0) 举报

The star\"

3楼-- · 2020-01-24 23:36

One possible reason is JAVA_HOME is not set because java is not installed.

I encountered the same issue. It says

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/spark/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

at sc = pyspark.SparkConf(). I solved it by running

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

0人赞添加讨论(0) 举报

兄弟一词,经得起流年.

4楼-- · 2020-01-24 23:36

Had the same issue with my iphython notebook (IPython 3.2.1) on Linux (ubuntu).

What was missing in my case was setting the master URL in the $PYSPARK_SUBMIT_ARGS environment like this (assuming you use bash):

export PYSPARK_SUBMIT_ARGS="--master spark://<host>:<port>"

e.g.

export PYSPARK_SUBMIT_ARGS="--master spark://192.168.2.40:7077"

You can put this into your .bashrc file. You get the correct URL in the log for the spark master (the location for this log is reported when you start the master with /sbin/start_master.sh).

0人赞添加讨论(0) 举报

霸刀☆藐视天下

5楼-- · 2020-01-24 23:37

I got the same Exception: Java gateway process exited before sending the driver its port number in Cloudera VM when trying to start IPython with CSV support with a syntax error:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0

will throw the error, while:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0

will not.

The difference is in that last colon in the last (working) example, seperating the Scala version number from the package version number.

0人赞添加讨论(0) 举报

别忘想泡老子

6楼-- · 2020-01-24 23:38

In my case it was because I wrote SPARK_DRIVER_MEMORY=10 instead of SPARK_DRIVER_MEMORY=10g in spark-env.sh

0人赞添加讨论(0) 举报

戒情不戒烟

7楼-- · 2020-01-24 23:38

For Linux (Ubuntu 18.04) with a JAVA_HOME issue, a key is to point it to the master folder:

Set Java 8 as default by: sudo update-alternatives --config java. If Jave 8 is not installed, install by: sudo apt install openjdk-8-jdk.
Set JAVA_HOME environment variable as the master java 8 folder. The location is given by the first command above removing jre/bin/java. Namely: export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/". If done on the command line, this will be relevant only for the current session (ref: export command on Linux). To verify: echo $JAVA_HOME.
In order to have this permanently set, add the bolded line above to a file that runs before you start your IDE/Jupyter/python interpreter. This could be by adding the bolded line above to .bashrc. This file loads when a bash is started interactively ref: .bashrc

0人赞添加讨论(0) 举报

Pyspark: Exception: Java gateway process exited be

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间