Running a Spark application on YARN, without spark

I know that Spark applications can be executed on YARN using spark-submit --master yarn.

The question is: is it possible to run a Spark application on yarn using the yarn command ?

If so, the YARN REST API could be used as interface for running spark and MapReduce applications in a uniform way.

标签： apache-spark yarn

4条回答

2楼-- · 2019-05-06 00:25

Thanks for the question. As suggested above the AM is a good route to write and submit one's application without invoking spark-submit. The community has built around the spark-submit command for YARN with the addition of flags that ease the addition of jars and/or configs etc. that are needed to get the application to execute successfully. Submitting Applications

An alternate solution(could try): You could have the spark job as an action in an Oozie workflow. Oozie Spark Extension Depending on what you wish to achieve, either route looks good. Hope it helps.

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-05-06 00:26

Just like all YARN Applications, Spark implements a Client and an ApplicationMaster when deploying on YARN. If you look at the implementation in the Spark repository, you'll have a clue as to how to create your own Client/ApplicationMaster : https://github.com/apache/spark/tree/master/yarn/src/main/scala/org/apache/spark/deploy/yarn . But out of the box it does not seem possible.

0人赞添加讨论(0) 举报

Summer. ? 凉城

4楼-- · 2019-05-06 00:34

I have not seen the lates package, but few months back such thing was not possible "out of the box" (this is info straight from cloudera support). I know it's not what you were hoping for, but that's what I know.

0人赞添加讨论(0) 举报

beautiful°

5楼-- · 2019-05-06 00:35

I see this question is a year old, but to anyone else who stumbles across this question it looks like this should be possible now. I've been trying to do something similar and have been attempting to follow the Starting Spark jobs directly via YARN REST API Tutorial from Hortonworks.

Essentially what you need to do is upload your jar to HDFS, create a Spark Job JSON file per the YARN REST API Documentation, and then use a curl command to start the application. An example of that command is:

curl -s -i -X POST -H "Content-Type: application/json" ${HADOOP_RM}/ws/v1/cluster/apps \
     --data-binary spark-yarn.json

0人赞添加讨论(0) 举报

Running a Spark application on YARN, without spark

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间