I use hadoop2.7.1's rest apis to run a mapreduce job outside the cluster. This example "http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api" really helped me. But when I submit a post response, some strange things happen:
I look at "http://master:8088/cluster/apps" and a post response produce two jobs as following picture: strange things: a response produces two jobs
After wait a long time, the job which I defined in the http response body fail because of FileAlreadyExistsException. The reason is another job creates the output directory, so Output directory hdfs://master:9000/output/output16 already exists.
This is my response body:
{
"application-id": "application_1445825741228_0011",
"application-name": "wordcount-demo",
"am-container-spec": {
"commands": {
"command": "{{HADOOP_HOME}}/bin/hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /data/ /output/output16"
},
"environment": {
"entry": [{
"key": "CLASSPATH",
"value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties"
}]
}
},
"unmanaged-AM": false,
"max-app-attempts": 2,
"resource": {
"memory": 1024,
"vCores": 1
},
"application-type": "MAPREDUCE",
"keep-containers-across-application-attempts": false
}
and this is my command:
curl -i -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' http://master:8088/ws/v1/cluster/apps?user.name=hadoop -d @post-json.txt
Can anybody help me? Thanks a lot.