I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:
Exception: Java gateway process exited before sending the driver its port number
when sc = SparkContext() is being called upon startup. I have tried running the following commands:
./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"
with no avail. I have also looked here:
Spark + Python - Java gateway process exited before sending the driver its port number?
but the question has never been answered. Please help! Thanks.
There are so many reasons for this error. My reason is : the version of pyspark is incompatible with spark. pyspark version :2.4.0, but spark version is 2.2.0. it always cause python always fail when starting spark process. then spark cannot tell its ports to python. so error will be "Pyspark: Exception: Java gateway process exited before sending the driver its port number ".
I suggest you dive into source code to find out the real reasons when this error happens
One possible reason is JAVA_HOME is not set because java is not installed.
I encountered the same issue. It says
at
sc = pyspark.SparkConf()
. I solved it by runningwhich is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04
Had the same issue with my iphython notebook (IPython 3.2.1) on Linux (ubuntu).
What was missing in my case was setting the master URL in the $PYSPARK_SUBMIT_ARGS environment like this (assuming you use bash):
e.g.
You can put this into your .bashrc file. You get the correct URL in the log for the spark master (the location for this log is reported when you start the master with /sbin/start_master.sh).
I got the same
Exception: Java gateway process exited before sending the driver its port number
in Cloudera VM when trying to start IPython with CSV support with a syntax error:PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0
will throw the error, while:
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0
will not.
The difference is in that last colon in the last (working) example, seperating the Scala version number from the package version number.
In my case it was because I wrote
SPARK_DRIVER_MEMORY=10
instead ofSPARK_DRIVER_MEMORY=10g
inspark-env.sh
For Linux (Ubuntu 18.04) with a JAVA_HOME issue, a key is to point it to the master folder:
sudo update-alternatives --config java
. If Jave 8 is not installed, install by:sudo apt install openjdk-8-jdk
.JAVA_HOME
environment variable as the master java 8 folder. The location is given by the first command above removingjre/bin/java
. Namely:export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/"
. If done on the command line, this will be relevant only for the current session (ref: export command on Linux). To verify:echo $JAVA_HOME
..bashrc
. This file loads when a bash is started interactively ref: .bashrc