Can't find spark application output

2019-07-27 20:46发布

问题:

I have a cluster that I can launch successfully, at least that's what appears on web UI in which I see this information

URL: spark://Name25:7077
REST URL: spark://Name25:6066 (cluster mode)
Alive Workers: 10
Cores in use: 192 Total, 0 Used
Memory in use: 364.0 GB Total, 0.0 B Used
Applications: 0 Running, 5 Completed
Drivers: 0 Running, 5 Completed
Status: ALIVE

I used submit command to run my application, if I use it in this way

./bin/spark-submit --class myapp.Main --master spark://Name25:7077 --deploy-mode cluster /home/lookupjar/myapp-0.0.1-SNAPSHOT.jar /home/etud500.csv  /home/

I get this message :

Running Spark using the REST application submission protocol. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/08/31 15:55:16 INFO RestSubmissionClient: Submitting a request to launch an application in spark://Name25:7077. 16/08/31 15:55:27 WARN RestSubmissionClient: Unable to connect to server spark://Name25:7077. Warning: Master endpoint spark://Name25:7077 was not a REST server. Falling back to legacy submission gateway instead. 16/08/31 15:55:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

and if I use it in this way :

./bin/spark-submit --class myapp.Main --master spark://Name25:6066 --deploy-mode cluster /home/lookupjar/myapp-0.0.1-SNAPSHOT.jar /home//etud500.csv  /home/result

I get this message

Running Spark using the REST application submission protocol. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 16/08/31 16:59:06 INFO RestSubmissionClient: Submitting a request to launch an application in spark://Name25:6066. 16/08/31 16:59:06 INFO RestSubmissionClient: Submission successfully created as driver-20160831165906-0004. Polling submission state... 16/08/31 16:59:06 INFO RestSubmissionClient: Submitting a request for the status of submission driver-20160831165906-0004 in spark://Name25:6066. 16/08/31 16:59:06 INFO RestSubmissionClient: State of driver driver-20160831165906-0004 is now RUNNING. 16/08/31 16:59:06 INFO RestSubmissionClient: Driver is running on worker worker-20160831143117-10.0.10.48-38917 at 10.0.10.48:38917. 16/08/31 16:59:06 INFO RestSubmissionClient: Server responded with CreateSubmissionResponse: { "action" : "CreateSubmissionResponse", "message" : "Driver successfully submitted as driver-20160831165906-0004", "serverSparkVersion" : "2.0.0", "submissionId" : "driver-20160831165906-0004", "success" : true }

I think it's a success but my application should have 3 outputs to the given path (/home/result), because I used in my code :

path =args [1];
rdd1.saveAsTextFile(path+"/rdd1");
rdd2.saveAsTextFile(path+"/rdd2");
rdd3.saveAsTextFile(path+"/rdd3");

Question 1 : Why does it ask me to use "spark://Name25:6066 " rather than "spark://Name25:7077 "? because according to spark website we use :7077

Question 2 : If it indicates success of submitting and completed applications, why don't I find the 3 output folders ?

回答1:

Submitting using 6066 does NOT indicate that your job is successfully completed. It just sends request, the job is running in background. You have to check on spark UI for the status of job completion.

If the job is completed and your job generated output files, you can check your file using:

hadoop dfs -ls <path>/rdd1