I am trying to write a very simple code using Spark in Pycharm and my os is Windows 8. I have been dealing with several problems which somehow managed to fix except for one. When I run the code using pyspark.cmd everything works smoothly but I have had no luck with the same code in pycharm. There was a problem with SPARK_HOME variable which I fixed using the following code:
import sys
import os
os.environ['SPARK_HOME'] = "C:/Spark/spark-1.4.1-bin-hadoop2.6"
sys.path.append("C:/Spark/spark-1.4.1-bin-hadoop2.6/python")
sys.path.append('C:/Spark/spark-1.4.1-bin-hadoop2.6/python/pyspark')
So now when I import the pyspark and everything is fine:
from pyspark import SparkContext
The problem rises when I want to run the rest of my code:
logFile = "C:/Spark/spark-1.4.1-bin-hadoop2.6/README.md"
sc = SparkContext()
logData = sc.textFile(logFile).cache()
logData.count()
When I receive the following error:
15/08/27 12:04:15 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
I have added the python path as an environment variable and it's working properly using the command line but I could not figure out what my problem is with this code. Any help or comment is much appreciated.
Thanks