My problem here is when I enter the value of p, Nothing happens, It does not pursue execution: is there a way to fix it please?
import sys
from pyspark import SparkContext
sc = SparkContext("local", "simple App")
p =input("Enter the word")
rdd1 = sc.textFile("monfichier")
rdd2= rdd1.map(lambda l : l.split("\t"))
rdd3=rdd2.map(lambda l: l[1])
print rdd3.take(6)
rdd5=rdd3.filter(lambda l : p in l)
sc.stop()
You have to distinguish between to different cases:
Script submitted with $SPARK_HOME/bin/spark-submit script.py
In this case you execute Scala application which in turn starts Python interpreter. Since Scala application doesn't expect any interaction from the standard input, not to mention passing it to Python interpreter, your Python script will simply hang waiting for data which won't come.
Script executed directly using Python interpreter (python script.py
).
You should be able to use input
directly but at the cost of handling all the configuration details, normally handled by spark-submit
/ org.apache.spark.deploy.SparkSubmit
, manually in your code.
In general all required arguments for your scripts can be passed using commandline
$SPARK_HOME/bin/spark-submit script.py some_app_arg another_app_arg
and accessed using standard methods like sys.argv
or argparse
and using input
is neither necessary nor useful.
You can use py4j
to get input via Java
Like this
scanner = sc._gateway.jvm.java.util.Scanner
sys_in = getattr(sc._gateway.jvm.java.lang.System, 'in')
result = scanner(sys_in).nextLine()
print result