可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm running a Spark job on YARN and would like to get the YARN container ID (as part of a requirement to generate unique IDs across a set of Spark jobs). I can see the Container.getId() method to get the ContainerId but no idea how to get a reference to the current running container from YARN. Is this even possible? How does a YARN container get it's own information?

回答1:

The only way that I could get something was to use the logging directory. The following works in a spark shell.

import org.apache.hadoop.yarn.api.records.ContainerId

def f(): String = {
  val localLogDir: String = System.getProperty("spark.yarn.app.container.log.dir")
  val containerIdString: String = localLogDir.split("/").last
  val containerIdLong: Long = ContainerId.fromString(containerIdString).getContainerId
  containerIdLong.toHexString
}

val rdd1 = sc.parallelize((1 to 10)).map{ _ => f() }
rdd1.distinct.collect().foreach(println)

回答2:

here below description how spark store the Container ID

Spark hide the container id and expose the executor id per application/job so if you are planning to maintain the unique id per spark job, my suggestion to use application id which spark gives you then you can add your some string to make unique for you

below spark code from "YarnAllocator.scala"

private[yarn] val executorIdToContainer = new HashMap[String, Container]

回答3:

YARN will export all of the environment variables listed here: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/ApplicationConstants.java#L117

So you should be able to access it like:

sys.env.get(ApplicationConstants.Environment.CONTAINER_ID.toString)
// or, equivalently
sys.env.get("CONTAINER_ID")

How do I get the YARN ContainerId from inside the

问题:

回答1:

回答2:

回答3:

收藏的人(0)

How do I get the YARN ContainerId from inside the

问题:

回答1:

回答2:

回答3:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮