I'm trying to connect to my HDFS instance running on Cloudera. My first step was enabling Kerberos and creating Keytabs (as shown here).
In the next step i would like to authenticate with a keytab.
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://cloudera:8020");
conf.set("hadoop.security.authentication", "kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("hdfs@CLOUDERA", "/etc/hadoop/conf/hdfs.keytab");
FileSystem fs = FileSystem.get(conf);
FileStatus[] fsStatus = fs.listStatus(new Path("/"));
for (int i = 0; i < fsStatus.length; i++) {
System.out.println(fsStatus[i].getPath().toString());
}
It fails with the following error
java.io.IOException: Login failure for hdfs@CLOUDERA from keytab
/etc/hadoop/conf/hdfs.keytab:
javax.security.auth.login.LoginException: Unable to obtain password
from user
The question is: how do I correctly handle the keytab? Do i have to copy it to my local machine?
When running a Hadoop client on Windows to reach a kerberized cluster, you need a specific "native library" (i.e. DLL).
As far as I can tell there is no good reason for that, because that lib is not actually used outside of some automated regression tests (!?!) so it's a pain inflicted to Hadoop users by Hadoop committers.
To add extra pain, there is no official build of that DLL (and of the Windows "stub" that enable its use from Java). You must either (a) build it yourself from source code -- good luck -- or (b) search the internet for a downloadable Hadoop-for-Windows runtime, and pray that is does not contain any malware.
The best option (for 64-bit Windows) is here: https://github.com/steveloughran/winutils
...and the ReadMe explains why you can reasonably trust that run-time. But if you are stuck with an older 32-bit Windows, then you are on your own.
Now let's assume you deployed that run-time on your Windows box under
C:\Some Dir\hadoop\bin\
(the final bin
is required; the embedded space is just extra fun)
You must point the Hadoop client to that run-time with a couple of Java properties:
"-Dhadoop.home.dir=C:/Some Dir/hadoop" "-Djava.library.path=C:/Some Dir/hadoop/bin"
(note the double quotes around Windows args as a whole, to protect embedded spaces in the paths, which have been translated to Java style for extra fun)
(in Eclipse, just stuff these props under "VM Arguments", quotes included)
Now, there's the Kerberos config. If your KDC is your corporate Active Directory server, then Java should find the config parameters automatically. But if your KDC is a standalone "MIT Kerberos" install on Linux, then you have to find a valid /etc/krb5.conf
file on the cluster, copy it on your Windows box, and have Java use it with an additional property...
"-Djava.security.krb5.conf=C:/Some Other Dir/krb5.conf"
Then let's assume you have created your keytab file on a Linux box, using ktutil
(or an Active Directory admin created it for you with some AD command) and you dropped the file under
C:\Some Other Dir\foo.keytab
Before anything else, if the keytab is for a real Windows account -- i.e. your own account -- or a Prod service account, then make sure that keytab is secure!! Use the Windows Security dialog box to restrict access to your account only (and maybe System, for backups). Because that file could enable anyone, on any machine, to authenticate on the cluster (and any Kerberos-enabled system, including Windows).
Now you can try to authenticate using
UserGroupInformation.loginUserFromKeytab("foo@BAR.ORG", "C:/Some Other Dir/foo.keytab");
If it does not work, enable the Kerberos debug traces with both an environment variable
set HADOOP_JAAS_DEBUG=true
...and a Java property
-Dsun.security.krb5.debug=true
(in Eclipse, set these in "Environment" and "VM Arguments" respectively)
Do you have set proper permissions?
chown hdfs:hadoop /etc/hadoop/conf/hdfs.keytab
chmod 440 /etc/hadoop/conf/hdfs.keytab