I configured Eclipse in order to develop with Spark and Python. I configured : 1. PyDev with the Python interpreter 2. PyDev with the Spark Python sources 3. PyDev with the Spark Environment variables.
This is my Libraries configuration :
And this is my Environment configuration :
I created a project named CompensationStudy and I want to run an small example and be sure that everything will go smoothly.
This is my code :
from pyspark import SparkConf, SparkContext
import os
sparkConf = SparkConf().setAppName("WordCounts").setMaster("local")
sc = SparkContext(conf = sparkConf)
textFile = sc.textFile(os.environ["SPARK_HOME"] + "/README.md")
wordCounts = textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
for wc in wordCounts.collect(): print wc
But I got this error : ImportError: No module named py4j.protocol
Logicly, all of PySpark’s library dependencies, including Py4J, are automatically imported when I configure PyDev with the Spark Python sources.. So, what's wrong here ? Is there just a problem with my log4j.properties file ? Help please !