Spark Installation and Configuration on MacOS Impo

2019-09-16 08:07发布

问题:

I'm trying to configure apache-spark on MacOS. All the online guides ask to either download the spark tar and set up some env variables or to use brew install apache-spark and then setup some env variables.

Now I installed apache-spark using brew install apache-spark. I run pyspark in terminal and I am getting a python prompt which suggests that the installation was successful.

Now when I try to do import pyspark into my python file, I'm facing error saying ImportError: No module named pyspark

The strangest thing I'm not able to understand is how is it able to start an REPL of pyspark and not able to import the module into python code.

I also tried doing pip install pyspark but it does not recognize the module either.

In addition to installing apache-spark with homebrew, I've set up following env variables.

if which java > /dev/null; then export JAVA_HOME=$(/usr/libexec/java_home); fi

if which pyspark > /dev/null; then
  export SPARK_HOME="/usr/local/Cellar/apache-spark/2.1.0/libexec/"
  export PYSPARK_SUBMIT_ARGS="--master local[2]"
fi

Please suggest what exactly is missing on my setup to run pyspark code on my local machine.

回答1:

sorry I dont use MAC , but there is another way in linux beside above answer:

sudo ln -s $SPARK_HOME/python/pyspark /usr/local/lib/python2.7/site-packages

Python will read module from /path/to/your/python/site-packages at last



回答2:

pyspark module is not include in your python

Try this instead

import os
import sys

os.environ['SPARK_HOME'] = "/usr/local/Cellar/apache-spark/2.1.0/libexec/"

sys.path.append("/usr/local/Cellar/apache-spark/2.1.0/libexec/python")
sys.path.append("/usr/local/Cellar/apache-spark/2.1.0/libexec/python/lib/py4j-0.10.4-src.zip")

try:
    from pyspark import SparkContext
    from pyspark import SparkConf

except ImportError as e:
    print ("error importing spark modules", e)
    sys.exit(1)

sc = SparkContext('local[*]','PySpark')

if you don't want that, include them into your system PATH. And don't forget to include the python path.

export SPARK_HOME=/usr/local/Cellar/apache-spark/2.1.0/libexec/
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/python:$PATH