No module named graphframes Jupyter Notebook

2019-06-25 13:13发布

问题:

I'm following this installation guide but have the following problem with using graphframes

from pyspark import SparkContext
sc =SparkContext()
!pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
from graphframes import *

--------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 from graphframes import *

ImportError: No module named graphframes

I'm not sure wether it is possible to install package on the following way. But I'll appreciate your advice and help.

回答1:

Good question!

Open up your bashrc file, and type export SPARK_OPTS="--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11". Once you saved your bashrc file, close it and type source .bashrc.

Finally, open up your notebook and type:

from pyspark import SparkContext
sc = SparkContext()
sc.addPyFile('/home/username/spark-2.3.0-bin-hadoop2.7/jars/graphframes-0.5.0-spark2.1-s_2.11.jar')

After that, you may able to run it.



回答2:

I'm using jupyter notebook in docker, trying to get graphframes working. First, I used the method in https://stackoverflow.com/a/35762809/2202107, I have:

import findspark
findspark.init()
import pyspark
import os

SUBMIT_ARGS = "--packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

conf = pyspark.SparkConf()
sc = pyspark.SparkContext(conf=conf)
print(sc._conf.getAll())

Then by following this issue, we finally are able to import graphframes: https://github.com/graphframes/graphframes/issues/172

import sys
pyfiles = str(sc.getConf().get(u'spark.submit.pyFiles')).split(',')
sys.path.extend(pyfiles)
from graphframes import *