-->

Zookeeper/SASL Checksum failed

2019-07-15 17:48发布

问题:

How do I fix the problem that generates this error:

WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@1040] - Client failed to SASL authenticate: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Checksum failed)]
    at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:199)
    at org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:50)

I have set up Zookeeper on an AWS EC2 instance. I have outlined the steps I followed to set up Kerberos and Zookeeper here. Zookeeper seems to be working:

zookeeper@zookeeper-server-01:~/zk/zookeeper-3.4.11$ JVMFLAGS="-Djava.security.auth.login.config=/home/zookeeper/jaas/jaas.conf -Dsun.security.krb5.debug=true" bin/zkServer.sh start-foreground
...
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply zookeeper/zookeeper-server-01
2017-12-22 00:21:52,308 [myid:] - INFO  [main:Login@297] - Server successfully logged in.
2017-12-22 00:21:52,312 [myid:] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login$1@130] - TGT refresh thread started.
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@305] - TGT valid starting at:        Fri Dec 22 00:21:52 UTC 2017
2017-12-22 00:21:52,313 [myid:] - INFO  [Thread-1:Login@306] - TGT expires:                  Fri Dec 22 10:21:52 UTC 2017
2017-12-22 00:21:52,314 [myid:] - INFO  [Thread-1:Login$1@185] - TGT refresh sleeping until: Fri Dec 22 08:25:59 UTC 2017

When I try, however, to connect a zkCli.sh (running on a different EC2 instance) to it, the server closes the connection and outputs the checksum error above.

The Zookeeper client appears to be able to connect to the Zookeeper server:

JVMFLAGS="-Djava.security.auth.login.config=/home/admin/Downloads/zookeeper-3.4.11/conf/zookeeper-test-client-jaas.conf -Dsun.security.krb5.debug=true" bin/zkCli.sh -server zookeeper-server-01.eigenroute.com:2181
Connecting to zookeeper-server-01.eigenroute.com:2181
2017-12-22 00:27:12,779 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=
3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
...
2017-12-22 00:27:12,788 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/admin/Downloads/zookeeper-3.4.11
2017-12-22 00:27:12,789 [myid:] - INFO  [main:ZooKeeper@441] - Initiating client connection, connectString=zookeeper-server-01.eigenroute.com:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1de0aca6
Welcome to ZooKeeper!
JLine support is enabled
...
>>> KrbAsReq creating message
[zk: zookeeper-server-01.eigenroute.com:2181(CONNECTING) 0] >>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=166
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=166
>>> KrbKdcReq send: #bytes read=310
>>>Pre-Authentication Data:
...

The client receives an error about needing preauthorization, but then appears to be successfully logged in (does this mean successfully authenticated?) to ...the Zookeeper server? Or logged into Kerberos?:

...
KRBError received: NEEDED_PREAUTH
KrbAsReqBuilder: PREAUTH FAILED/REQ, re-send AS-REQ
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
Looking for keys for: zktestclient/eigenroute.com@EIGENROUTE.COM
Added key: 17version: 3
Added key: 18version: 3
Looking for keys for: zktestclient/eigenroute.com@EIGENROUTE.COM
Added key: 17version: 3
Added key: 18version: 3
Using builtin default etypes for default_tkt_enctypes
default etypes for default_tkt_enctypes: 18 17 16 23.
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsReq creating message
>>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=253
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=253
>>> KrbKdcReq send: #bytes read=742
>>> KdcAccessibility: remove kerberos-server-01.eigenroute.com
Looking for keys for: zktestclient/eigenroute.com@EIGENROUTE.COM
Added key: 17version: 3
Added key: 18version: 3
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbAsRep cons in KrbAsReq.getReply zktestclient/eigenroute.com
2017-12-22 00:27:13,286 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):Login@297] - Client successfully logged in.
...

The client then opens a socket connection to the Zookeeper server, and attempts to SASL authenticate to it:

...
2017-12-22 00:27:13,312 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@103
5] - Opening socket connection to server 35.169.37.216/35.169.37.216:2181. Will attempt to SASL-authen
ticate using Login Context section 'Client'
2017-12-22 00:27:13,317 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@877
] - Socket connection established to 35.169.37.216/35.169.37.216:2181, initiating session
2017-12-22 00:27:13,359 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@1302] - Session establishment complete on server 35.169.37.216/35.169.37.216:2181, sessionid = 0x1000436873a0001, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
Found ticket for zktestclient/eigenroute.com@EIGENROUTE.COM to go to krbtgt/EIGENROUTE.COM@EIGENROUTE.
COM expiring on Fri Dec 22 10:27:13 UTC 2017
Entered Krb5Context.initSecContext with state=STATE_NEW
Found ticket for zktestclient/eigenroute.com@EIGENROUTE.COM to go to krbtgt/EIGENROUTE.COM@EIGENROUTE.
COM expiring on Fri Dec 22 10:27:13 UTC 2017
Service ticket not found in the subject
>>> Credentials acquireServiceCreds: same realm
Using builtin default etypes for default_tgs_enctypes
default etypes for default_tgs_enctypes: 18 17 16 23.
>>> CksumType: sun.security.krb5.internal.crypto.RsaMd5CksumType
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbKdcReq send: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000, number of retries =3, #bytes=712
>>> KDCCommunication: kdc=kerberos-server-01.eigenroute.com UDP:88, timeout=30000,Attempt =1, #bytes=712
>>> KrbKdcReq send: #bytes read=678
>>> KdcAccessibility: remove kerberos-server-01.eigenroute.com
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
>>> KrbApReq: APOptions are 00000000 00000000 00000000 00000000
>>> EType: sun.security.krb5.internal.crypto.Aes256CtsHmacSha1EType
Krb5Context setting mySeqNumber to: 50687702
Krb5Context setting peerSeqNumber to: 0
Created InitSecContextToken:
0000: 01 00 6E 82 02 6B 30 82   02 67 A0 03 02 01 05 A1  ..n..k0..g......
...
0260: 33 25 94 1F 60 93 E9 CF   7E EF 15 82 F8 6D ED 06  3%..`........m..
0270: 43                                                 C

2017-12-22 00:27:13,405 [myid:] - INFO  [main-SendThread(35.169.37.216:2181):ClientCnxn$SendThread@1161] - Unable to read additional data from server sessionid 0x1000436873a0001, likely server has closed socket, closing socket connection and attempting reconnect

WATCHER::

WatchedEvent state:Disconnected type:None path:null

So SASL authentication is not a complete failure, but the Zookeeper server closes the connection (on account of a checksum failure).

UPDATE #1. In response to T-Heron's comment, the result of nslookup zookeeper-server-01.eigenroute.com on the client machine is:

Server:     172.31.0.2
Address:    172.31.0.2#53

Non-authoritative answer:
Name:   zookeeper-server-01.eigenroute.com
Address: 35.169.37.216

The DNS entry for zookeeper-server-01.eigenroute.com is:

zookeeper-server-01.eigenroute.com  30 minutes  A       
35.169.37.216

On the client machine, /etc/hosts contains:

127.0.1.1 ip-172-31-95-211.ec2.internal ip-172-31-95-211
127.0.0.1 localhost
34.239.197.36 kerberos-server-02

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

(kerberos-server-02 is misnamed, it is not a KDC, when I comment this line out the result is the same) and on the ZooKeeper server, zookeeper-server-01.eigenroute.com, /etc/hosts contains:

127.0.1.1 ip-172-31-88-14.ec2.internal ip-172-31-88-14
127.0.0.1 localhost
34.225.180.212 kerberos-server-01

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

(the entry for kerberos-server-01 doesn't need to be there - when I remove it the result is the same).

Can someone explain how to solve the checksum failure? Thanks!

回答1:

My KDC had the following principals:

zookeeper/35.169.37.216@EIGENROUTE.COM
zookeeper/zookeeper-server-01.eigenroute.com@EIGENROUTE.COM

In the JAAS configuration for the ZooKeeper server, whose host name is zookeeper-server-01.eigenroute.com, I used a keytab that I created for zookeeper/zookeeper-server-01.eigenroute.com@EIGENROUTE.COM.

When I instead created a keytab for zookeeper/35.169.37.216@EIGENROUTE.COM and used this keytab in the JAAS configuration for the ZooKeeper server, everything worked - SASL authentication from the client succeeded.

I would rather use the fully qualified domain name (zookeeper-server-01.eigenroute.com) in the name of the Kerberos principal, rather than the IP address. If anyone can tell me how to get that working, I'll accept that as the answer. Until then, this will suffice.

UPDATE: I figured it out. The Zookeeper client takes the FQDN from the -server argument, looks up the IP Address of this FQDN, and creates an InetSocketAddress object from this (org.apache.zookeeper.client.StaticHostProvider). Then to get the host name, it calls .getHostName (org.apache.zookeeper.ClientCnxn.SendThread.startConnect). On my local machine, this returns:

ec2-35-169-37-216.compute-1.amazonaws.com

and on my client AWS EC2 instance, this returns:

35.169.37.216

when instead I expected it to return the FQDN. This is why on my AWS EC2 client machine, the ZooKeeper client tries to get a ticket for:

zookeeper/35.169.37.216@EIGENROUTE.COM

and on my local machine, the ZooKeeper client tries to get a ticket for:

zookeeper/ec2-35-169-37-216.compute-1.amazonaws.com@EIGENROUTE.COM

So I need AWS to make sure that a reverse DNS lookup on 35.169.37.216 yields zookeeper-server-01.eigenroute.com. The solution I found so far is to ask AWS to set up the mapping for the reverse DNS.

Ideally, ZooKeeper would have an option to skip this reverse DNS lookup and just use the FQDN as the host name (maybe it does and I haven't found it).