Function input() in pyspark

2019-07-09 06:27发布

问题:

My problem here is when I enter the value of p, Nothing happens, It does not pursue execution: is there a way to fix it please?

import sys
from pyspark import SparkContext
sc = SparkContext("local", "simple App") 

p =input("Enter the word")
rdd1 = sc.textFile("monfichier") 
rdd2= rdd1.map(lambda l : l.split("\t")) 
rdd3=rdd2.map(lambda l: l[1])  
print rdd3.take(6)
rdd5=rdd3.filter(lambda l : p in l)

sc.stop()

回答1:

You have to distinguish between to different cases:

  • Script submitted with $SPARK_HOME/bin/spark-submit script.py

    In this case you execute Scala application which in turn starts Python interpreter. Since Scala application doesn't expect any interaction from the standard input, not to mention passing it to Python interpreter, your Python script will simply hang waiting for data which won't come.

  • Script executed directly using Python interpreter (python script.py).

    You should be able to use input directly but at the cost of handling all the configuration details, normally handled by spark-submit / org.apache.spark.deploy.SparkSubmit, manually in your code.

In general all required arguments for your scripts can be passed using commandline

$SPARK_HOME/bin/spark-submit script.py some_app_arg another_app_arg

and accessed using standard methods like sys.argv or argparse and using input is neither necessary nor useful.



回答2:

You can use py4j to get input via Java
Like this

scanner = sc._gateway.jvm.java.util.Scanner  
sys_in = getattr(sc._gateway.jvm.java.lang.System, 'in')  
result = scanner(sys_in).nextLine()  
print result