I'm using HDP2.6. where is installed oozie 4.2. and Spark2.
After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html for adding libs for Spark2 in 4.2. version of Oozie.
After I submit the job with this add-on:
oozie.action.sharelib.for.spark=spark2
The error I'm getting is this:
2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
2017-07-19 12:36:53,275 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2] JOB[0000012-170717153234639-oozie-oozi-W] ACTION[0000012-170717153234639-oozie-oozi-W@spark_1] Launcher exception: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
java.lang.IllegalArgumentException: Attempt to add (hdfs://:8020/user/oozie/share/lib/lib_20170613110051/oozie/aws-java-sdk-core-1.10.6.jar) multiple times to the distributed cache.
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:629)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13$$anonfun$apply$8.apply(Client.scala:620)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:620)
at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$13.apply(Client.scala:619)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:619)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:892)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1228)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1287)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:745)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:311)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:232)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:58)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:62)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:239)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
I have read that new Spark2 will not work with Spark 2.1 (via oozie anyway) due to a change in how Spark handles multiple files found in distributed cache, as mentioned here: see here
Keep in mind that I'm using Ambari and HDP2.6. How can I deal with this?
You need to check the content of the
oozie
directory andspark2
directory into the Oozie sharelib. If there are any jars present into both, just remove them from one place and try again. Also, do execute the oozie admin sharelub update command to update it.Hope this will help you.