Scala member field visibility in Spark jobs

I have a Scala class that I define like so:

import org.apache.spark.{SparkConf, SparkContext}

object TestObject extends App{
  val FAMILY = "data".toUpperCase

  override def main(args: Array[String]) {
    val sc = new SparkContext(new SparkConf())

    sc.parallelize(1 to 10)
      .map(getData)
      .saveAsTextFile("my_output")
  }

  def getData(i: Int) = {
    ( i, FAMILY, "data".toUpperCase )
  }
}

I submit it to a YARN cluster like so:

HADOOP_CONF_DIR=/etc/hadoop/conf spark-submit \
    --conf spark.hadoop.validateOutputSpecs=false \
    --conf spark.yarn.jar=hdfs:/apps/local/spark-assembly-1.2.1-hadoop2.4.0.jar \
    --deploy-mode=cluster \
    --master=yarn \
    --class=TestObject \
    target/scala-2.11/myjar-assembly-1.1.jar

Unexpectedly, the output looks like the following, indicating that the getData method can't see the value of FAMILY:

(1,null,DATA)
(2,null,DATA)
(3,null,DATA)
(4,null,DATA)
(5,null,DATA)
(6,null,DATA)
(7,null,DATA)
(8,null,DATA)
(9,null,DATA)
(10,null,DATA)

What do I need to understand, about fields and scoping and visibility and spark submission and objects and singletons and whatnot, to understand why this is happening? And what should I be doing instead, if I basically want variables defined as "constants" visible to the getData method?

标签： scala initialization apache-spark visibility

2条回答

Bombasti

2楼-- · 2019-02-19 17:25

Figured it out. It's the App trait causing trouble. It manifests even in this simple class:

object TestObject extends App {
  val FAMILY = "data"
  override def main(args: Array[String]) = println(FAMILY, "data")
}
# prints "(null,data)"

Apparently App inherits from DelayedInit, which means that when main() runs, FAMILY hasn't been initialized. Exactly what I don't want, so I'm going to stop using App.

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-02-19 17:27

I might be missing something, but I don't think you should be defining a main method. When you extend App, you inherit a main, and you should not override it since that is what actually invokes the code in your App.

For example, the simple class in your answer should be written

object TestObject extends App {
  val FAMILY = "data"
  println(FAMILY, "data")
}

0人赞添加讨论(0) 举报

Scala member field visibility in Spark jobs

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间