Pyspark: Exception: Java gateway process exited be

2020-01-24 23:28发布

问题:

I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

Exception: Java gateway process exited before sending the driver its port number

when sc = SparkContext() is being called upon startup. I have tried running the following commands:

./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

with no avail. I have also looked here:

Spark + Python - Java gateway process exited before sending the driver its port number?

but the question has never been answered. Please help! Thanks.

回答1:

this should help you

One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.



回答2:

One possible reason is JAVA_HOME is not set because java is not installed.

I encountered the same issue. It says

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
    at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
    at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/opt/spark/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway
    raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number

at sc = pyspark.SparkConf(). I solved it by running

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04



回答3:

Had the same issue with my iphython notebook (IPython 3.2.1) on Linux (ubuntu).

What was missing in my case was setting the master URL in the $PYSPARK_SUBMIT_ARGS environment like this (assuming you use bash):

export PYSPARK_SUBMIT_ARGS="--master spark://<host>:<port>"

e.g.

export PYSPARK_SUBMIT_ARGS="--master spark://192.168.2.40:7077"

You can put this into your .bashrc file. You get the correct URL in the log for the spark master (the location for this log is reported when you start the master with /sbin/start_master.sh).



回答4:

After spending hours and hours trying many different solutions, I can confirm that Java 10 SDK causes this error. On Mac, please navigate to /Library/Java/JavaVirtualMachines then run this command to uninstall Java JDK 10 completely:

sudo rm -rf jdk-10.jdk/

After that, please download JDK 8 then the problem will be solved.



回答5:

Had this error message running pyspark on Ubuntu, got rid of it by installing the openjdk-8-jdk package

from pyspark import SparkConf, SparkContext
sc = SparkContext(conf=SparkConf().setAppName("MyApp").setMaster("local"))
^^^ error

Install Open JDK 8:

apt-get install openjdk-8-jdk-headless -qq    


回答6:

In my case this error came for the script which was running fine before. So I figured out that this might be due to my JAVA update. Before I was using java 1.8 but I had accidentally updated to java 1.9. When I switched back to java 1.8 the error disappeared and everything is running fine. For those, who get this error for the same reason but do not know how to switch back to older java version on ubuntu: run

sudo update-alternatives --config java 

and make the selection for java version



回答7:

I got the same Java gateway process exited......port number exception even though I set PYSPARK_SUBMIT_ARGS properly. I'm running Spark 1.6 and trying to get pyspark to work with IPython4/Jupyter (OS: ubuntu as VM guest).

While I got this exception, I noticed an hs_err_*.log was generated and it started with:

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.

So I increased the memory allocated for my ubuntu via VirtualBox Setting and restarted the guest ubuntu. Then this Java gateway exception goes away and everything worked out fine.



回答8:

I got the same Exception: Java gateway process exited before sending the driver its port number in Cloudera VM when trying to start IPython with CSV support with a syntax error:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0

will throw the error, while:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0

will not.

The difference is in that last colon in the last (working) example, seperating the Scala version number from the package version number.



回答9:

Had same issue, after installing java using below lines solved the issue !

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer


回答10:

I figured out the problem in Windows system. The installation directory for Java must not have blanks in the path such as in C:\Program Files. I re-installed Java in C\Java. I set JAVA_HOME to C:\Java and the problem went away.



回答11:

I have the same error in running pyspark in pycharm. I solved the problem by adding JAVA_HOME in pycharm's environment variables.



回答12:

If you are trying to run spark without hadoop binaries, you might encounter the above mentioned error. One solution is to :

1) download hadoop separatedly.
2) add hadoop to your PATH
3) add hadoop classpath to your SPARK install

The first two steps are trivial, the last step can be best done by adding the following in the $SPARK_HOME/conf/spark-env.sh in each spark node (master and workers)

### in conf/spark-env.sh ###

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

for more info also check: https://spark.apache.org/docs/latest/hadoop-provided.html



回答13:

I use Mac OS. I fixed the problem!

Below is how I fixed it.

JDK8 seems works fine. (https://github.com/jupyter/jupyter/issues/248)

So I checked my JDK /Library/Java/JavaVirtualMachines, I only have jdk-11.jdk in this path.

I downloaded JDK8 (I followed the link). Which is:

brew tap caskroom/versions
brew cask install java8

After this, I added

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_202.jdk/Contents/Home
export JAVA_HOME="$(/usr/libexec/java_home -v 1.8)"

to ~/.bash_profile file. (you sholud check your jdk1.8 file name)

It works now! Hope this help :)



回答14:

I had the same exception and I tried everything by setting and resetting all environment variables. But the issue in the end drilled down to space in appname property of spark session,that is, "SparkSession.builder.appName("StreamingDemo").getOrCreate()". Immediately after removing space from string given to appname property it got resolved.I was using pyspark 2.7 with eclipse on windows 10 environment. It worked for me. Enclosed are required screenshots.



回答15:

Spark is very picky with the Java version you use. It is highly recommended that you use Java 1.8 (The open source AdoptOpenJDK 8 works well too). After install it, set JAVA_HOME to your bash variables, if you use Mac/Linux:

export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)

export PATH=$JAVA_HOME/bin:$PATH



回答16:

I got this error because I was running low on disk space.



回答17:

Worked hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.



回答18:

I have the same error.

My trouble shooting procedures are:

  1. Check out Spark source code.
  2. Follow the error message. In my case: pyspark/java_gateway.py, line 93, in launch_gateway.
  3. Check the code logic to find the root cause then you will resolve it.

In my case the issue is PySpark has no permission to create some temporary directory, so I just run my IDE with sudo



回答19:

For me, the answer was to add two 'Content Roots' in 'File' -> 'Project Structure' -> 'Modules' (in IntelliJ):

  1. YourPath\spark-2.2.1-bin-hadoop2.7\python
  2. YourPath\spark-2.2.1-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip


回答20:

This is an old thread but I'm adding my solution for those who use mac.

The issue was with the JAVA_HOME. You have to include this in your .bash_profile.

Check your java -version. If you downloaded the latest Java but it doesn't show up as the latest version, then you know that the path is wrong. Normally, the default path is export JAVA_HOME= /usr/bin/java.

So try changing the path to: /Library/Internet\ Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java

Alternatively you could also download the latest JDK. https://www.oracle.com/technetwork/java/javase/downloads/index.html and this will automatically replace usr/bin/java to the latest version. You can confirm this by doing java -version again.

Then that should work.



回答21:

Make sure that both your Java directory (as found in your path) AND your Python interpreter reside in directories with no spaces in them. These were the cause of my problem.



回答22:

In my case it was because I wrote SPARK_DRIVER_MEMORY=10 instead of SPARK_DRIVER_MEMORY=10g in spark-env.sh



回答23:

For Linux (Ubuntu 18.04) with a JAVA_HOME issue, a key is to point it to the master folder:

  1. Set Java 8 as default by: sudo update-alternatives --config java. If Jave 8 is not installed, install by: sudo apt install openjdk-8-jdk.
  2. Set JAVA_HOME environment variable as the master java 8 folder. The location is given by the first command above removing jre/bin/java. Namely: export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/". If done on the command line, this will be relevant only for the current session (ref: export command on Linux). To verify: echo $JAVA_HOME.
  3. In order to have this permanently set, add the bolded line above to a file that runs before you start your IDE/Jupyter/python interpreter. This could be by adding the bolded line above to .bashrc. This file loads when a bash is started interactively ref: .bashrc


回答24:

There are so many reasons for this error. My reason is : the version of pyspark is incompatible with spark. pyspark version :2.4.0, but spark version is 2.2.0. it always cause python always fail when starting spark process. then spark cannot tell its ports to python. so error will be "Pyspark: Exception: Java gateway process exited before sending the driver its port number ".

I suggest you dive into source code to find out the real reasons when this error happens



回答25:

I go this error fixed by using the below code. I had setup the SPARK_HOME though. You may follow this simple steps from eproblems website

spark_home = os.environ.get('SPARK_HOME', None)