Spark enable security with secure YARN Hadoop clus

2019-07-14 04:09发布

问题:

I have an Hadoop 3.0 cluster configured with Kerberos. Everything works fine and YARN is started as well.

Now I wish to add Spark on top of it and make full use of Hadoop and security. To do so I use a binary distribution of Spark 2.3 and modified the following.

In spark-env.sh:

YARN_CONF_DIR, set to the folder where my Hadoop configuration files core-site.xml, hdfs-site.xml and yarn-site.xml are located.

In spark-defaults.conf:

spark.master                yarn
spark.submit.deployMode     cluster
spark.authenticate          true
spark.yarn.principal        mysparkprincipal
spark.yarn.keytab           mykeytabfile

If I understood correctly when using YARN, the secret key will be generated automatically and I don't need to manually set spark.authenticate.secret.

The problem I have is that the worker is complaining about the key:

java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config

I also don't have any indication in the logs that Spark is using YARN or try to do anything with my hdfs volume. It's almost like Hadoop configuration files are ignored completely. I've read documentation on YARN and security for Spark but it's not very clear to me.

My questions are:

  • How can i be sure that Spark is using YARN
  • Do I need to set spark.yarn.access.hadoopFileSystems if I only use the server set in YARN_CONF_DIR
  • Is LOCAL_DIRS best to be set to HDFS and if yes, what is the syntax
  • Do I need both HADOOP_CONF_DIR and YARN_CONF_DIR?

Edit/Add:

After looking at the source code the exception is from SASL which is not enabled for Spark so I don't understand.

My Hadoop has SSL enabled (Data Confidentiality) and since I give Spark my server configuration maybe it requires SSL for Spark if the configuration for Hadoop has it enabled.

So far I am really confused about everything.

  • It says that environment variables need to be set using the spark.yarn.appMasterEnv. But which one? All of them?
  • Also it says that it is Hadoop CLIENT file that I need to have on the classpath but what properties should be present in a CLIENT file? I guess I can replace the XML files using spark.hadoop.* properties but what are the properties required for Spark to know where is my YARN cluster?
  • Setting spark.authenticate.enableSaslEncryption to false seems to have no effect as the exception is still about SparkSaslClient

The exception is:

java.lang.IllegalArgumentException: A secret key must be specified via the spark.authenticate.secret config
    at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
    at org.apache.spark.SecurityManager$$anonfun$getSecretKey$4.apply(SecurityManager.scala:510)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:509)
    at org.apache.spark.SecurityManager.getSecretKey(SecurityManager.scala:551)
    at org.apache.spark.network.sasl.SparkSaslClient$ClientCallbackHandler.handle(SparkSaslClient.java:137)
    at com.sun.security.sasl.digest.DigestMD5Client.processChallenge(DigestMD5Client.java:337)
    at com.sun.security.sasl.digest.DigestMD5Client.evaluateChallenge(DigestMD5Client.java:220)
    at org.apache.spark.network.sasl.SparkSaslClient.response(SparkSaslClient.java:98)
    at org.apache.spark.network.sasl.SaslClientBootstrap.doBootstrap(SaslClientBootstrap.java:71)
    at org.apache.spark.network.crypto.AuthClientBootstrap.doSaslAuth(AuthClientBootstrap.java:115)
    at org.apache.spark.network.crypto.AuthClientBootstrap.doBootstrap(AuthClientBootstrap.java:74)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:257)
    at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
    at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
    at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)