spark-submit: --jars does not work

I am building metrics system for Spark Streaming job, in the system, the metrics are collected in each executor, so a metrics source (a class used to collect metrics) needs to be initialized in each executor.

The metrics source is packaged in a jar, when submitting a job, the jar is sent from local to each executor using the parameter '--jars', however, the executor starts to initialize the metrics source class before the jar arrives, as a result, it throws class not found exception.

It seems that if the executor could wait until all resources are ready, the issue will be resolved, but I really do not know how to do it.

Is there anyone facing the same issue?

PS: I tried using HDFS (copy the jar to HDFS, then submit the job and let the executor load class from a path in HDFS), but it fails. I checked the source code, it seems that the class loader can only resolve local path.

Here is the log, you can see that the jar is added to classpath at 2016-01-15 18:08:07, but the initialization starts at 2016-01-15 18:07:26

INFO 2016-01-15 18:08:07 org.apache.spark.executor.Executor: Adding file:/var/lib/spark/worker/worker-0/app-20160115180722-0041/0/./datainsights-metrics-source-assembly-1.0.jar to class loader

ERROR 2016-01-15 18:07:26 Logging.scala:96 - org.apache.spark.metrics.MetricsSystem: Source class org.apache.spark.metrics.PerfCounterSource cannot be instantiated

Here is the command I use:

spark-submit --verbose \
 --jars /tmp/datainsights-metrics-source-assembly-1.0.jar \ 
 --conf "spark.metrics.conf=metrics.properties" \
 --class org.microsoft.ofe.datainsights.StartServiceSignalPipeline \
 ./target/datainsights-1.0-jar-with-dependencies.jar

I could think of couple of Options: -

Create a Fat Jar File, which includes main classes and dependencies.
If dependencies are used only by executors and not by the Driver, then you can explicitly add jar files using SparkConf.setJars(....) or in case it is used by driver too, then you can also use command line option --driver-class-path for configuring Driver classpath.

Try configure it in Spark-default.conf using following parameters: -

spark.executor.extraClassPath=<classapth>
spark.executor.extraClassPath=<classapth>

No matter what you do, I would suggest to fix the network latency, otherwise it will hurt the performance of Spark jobs.

spark-submit: --jars does not work

问题:

回答1:

收藏的人(0)

spark-submit: --jars does not work

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮