I have an Hadoop 3.0 cluster configured with Kerberos. Everything works fine and YARN is started as well.
Now I wish to add Spark on top of it and make full use of Hadoop and security. To do so I use a binary distribution of Spark 2.3 and modified the following.
In spark-env.sh
:
YARN_CONF_DIR
, set to the folder where my Hadoop configuration files core-site.xml
, hdfs-site.xml
and yarn-site.xml
are located.
In spark-defaults.conf
:
spark.master yarn
spark.submit.deployMode cluster
spark.authenticate true
spark.yarn.principal mysparkprincipal
spark.yarn.keytab mykeytabfile
If I understood correctly when using YARN, the secret key will be generated automatically and I don't need to manually set spark.authenticate.secret
.
The problem I have is that the worker is complaining about the key:
java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config
I also don't have any indication in the logs that Spark is using YARN or try to do anything with my hdfs volume. It's almost like Hadoop configuration files are ignored completely. I've read documentation on YARN and security for Spark but it's not very clear to me.
My questions are:
- How can i be sure that Spark is using YARN
- Do I need to set
spark.yarn.access.hadoopFileSystems
if I only use the server set inYARN_CONF_DIR
- Is
LOCAL_DIRS
best to be set to HDFS and if yes, what is the syntax - Do I need both
HADOOP_CONF_DIR
andYARN_CONF_DIR
?
Edit/Add:
After looking at the source code the exception is from SASL which is not enabled for Spark so I don't understand.
My Hadoop has SSL enabled (Data Confidentiality) and since I give Spark my server configuration maybe it requires SSL for Spark if the configuration for Hadoop has it enabled.
So far I am really confused about everything.
- It says that environment variables need to be set using the
spark.yarn.appMasterEnv
. But which one? All of them? - Also it says that it is Hadoop CLIENT file that I need to have on the classpath but what properties should be present in a CLIENT file?
I guess I can replace the XML files using
spark.hadoop.*
properties but what are the properties required for Spark to know where is my YARN cluster? - Setting
spark.authenticate.enableSaslEncryption
to false seems to have no effect as the exception is still aboutSparkSaslClient
The exception is:
java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config
at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509)
at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551)
at org.apache.spark.network.sasl.SparkSaslClient$ClientCallbackHandler.handle(SparkSaslClient.java:137)
at com.sun.security.sasl.digest.DigestMD5Client.processChallenge(DigestMD5Client.java:337)
at com.sun.security.sasl.digest.DigestMD5Client.evaluateChallenge(DigestMD5Client.java:220)
at org.apache.spark.network.sasl.SparkSaslClient.response(SparkSaslClient.java:98)
at org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:71)
at org.apache.spark.network.crypto.AuthClientBootstrap.doSaslAuth(AuthClientBootstrap.java:115)
at org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:74)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:257)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)