Kerberos: check sum failed issue

2019-09-07 21:16发布

问题:

I am seeing the" KrbException: Checksum failed" Exception. Looks like kerberos issue but I am not able to figure out.

Any pointers on how to resolve will be great! Thanks in advance.

Machine details:

lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 12.04.4 LTS Release: 12.04

java -version

java version "1.7.0_55" OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1~0.12.04.2) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

2014-06-17 22:19:24,475 ERROR [pool-6-thread-198]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: GSS initiate failed
        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:676)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:673)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1574)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge20S.java:673)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
        at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:297)
        at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
        ... 10 more
2014-06-17 22:19:25,481 ERROR [pool-6-thread-198]: transport.TSaslTransport (TSaslTransport.java:open(296)) - SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
        at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177)
        at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:509)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:264)
        at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:676)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:673)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1574)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge20S.java:673)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)
        at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
        at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
        at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
        at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:155)
        ... 14 more
Caused by: KrbException: Checksum failed
        at sun.security.krb5.internal.crypto.Des3CbcHmacSha1KdEType.decrypt(Des3CbcHmacSha1KdEType.java:96)
        at sun.security.krb5.internal.crypto.Des3CbcHmacSha1KdEType.decrypt(Des3CbcHmacSha1KdEType.java:88)
        at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:177)
        at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:278)
        at sun.security.krb5.KrbApReq.<init>(KrbApReq.java:144)
        at sun.security.jgss.krb5.InitSecContextToken.<init>(InitSecContextToken.java:108)
        at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771)
        ... 17 more
Caused by: java.security.GeneralSecurityException: Checksum failed
        at sun.security.krb5.internal.crypto.dk.DkCrypto.decrypt(DkCrypto.java:362)
        at sun.security.krb5.internal.crypto.Des3.decrypt(Des3.java:79)
        at sun.security.krb5.internal.crypto.Des3CbcHmacSha1KdEType.decrypt(Des3CbcHmacSha1KdEType.java:94)
        ... 23 more
2014-06-17 22:19:25,482 ERROR [pool-6-thread-198]: server.TThreadPoolServer (TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: GSS initiate failed
        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:676)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:673)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1574)
        at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge20S.java:673)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
        at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)
        at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:297)
        at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
        at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
        ... 10 more

回答1:

I meet this problem server times when I deploy Hadoop Secure Mode with Kerberos, they are caused by the same reason: addresses are not set in FQDN (fully qualified domain name).

Suppose the machine's hostname is ts01.test.com

Wrong example:

<property>
    <name>dfs.namenode.rpc-address.hdfs1</name>
    <value>192.168.1.101:8020</value>
</property>

Wrong example:

<property>
    <name>dfs.namenode.rpc-address.hdfs1</name>
    <value>ts01:8020</value>
</property>

Right example:

<property>
    <name>dfs.namenode.rpc-address.hdfs1</name>
    <value>ts01.test.com:8020</value>
</property>

You should keep all your address is in FQDN, not just dfs.namenode.rpc-address.