Which of these is used to submit job for execution in job tracker. It would be great if one can explain how both these classes are being used in different use cases.
相关问题
- Spark on Yarn Container Failure
- enableHiveSupport throws error in java spark code
- spark select and add columns with alias
- Unable to generate jar file for Hadoop
-
hive: cast array
> into map
相关文章
- Java写文件至HDFS失败
- mapreduce count example
- Could you give me any clue Why 'Cannot call me
- Hive error: parseexception missing EOF
- Exception in thread “main” java.lang.NoClassDefFou
- ClassNotFoundException: org.apache.spark.SparkConf
- How can I configure the maven shade plugin to incl
- How was the container created and how does it work
Question 1: JobClient
Job control is done through the Job class in New API rather than the old class JobClient
Job
is job submitter's view of the Job.It allows the user to configure the job, submit it, control its execution, and query the state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.
Normally the user creates the application, describes various facets of the job via Job and then submits the job and monitor its progress.
Question 2: JobSubmitter
The
submit()
method onJob
creates an internalJobSubmitter
instance and callssubmitJobInternal()
on it.Once the job is submitted the job,waitForCompletion() polls the job’s progress once per second and reports the progress to the console. When the
job
completes successfully, the job counters are displayed. Otherwise, the error that caused the job to fail is logged to the console.The job submission process implemented by
JobSubmitter
does the following:Asks the resource manager for a new application ID, used for the
MapReduce job ID
Checks the output specification of the job. For example, if the output directory has not been specified or it already exists, the job is not submitted and an error is thrown to the
MapReduce
program.Computes the
input splits
for the job. If thesplits
cannot be computed (because the input paths don’t exist, for example), the job is not submitted and an error is thrown to theMapReduce
program.Copies the resources needed to run the job, including the job JAR file, the configuration file, and the computed input splits, to the shared filesystem in a directory named after the job ID .
The job JAR is copied with a high replication factor (controlled by the mapreduce.client.submit.file.replication property, which defaults to 10) so that there are lots of copies across the cluster for the node managers to access when they run tasks for the job.
submits the job by calling
submitApplication()
on the resource managerHadoop : The defiinitive guide foruth edition
is one of the best books to understand the conceptsFrom code end, you can refer to source code from grepcode :
Job : API to check :
waitForCompletion()
=>submit()
=>jobClient.submitJobInternal
JobClient :
submitJobInternal