Cloudera/CDH v6.1.x + Python HappyBase v1.1.0: TTr

2019-04-16 20:10发布

问题:

EDIT: This question and answer applies to anyone who is experiencing the exception stated in the subject line: TTransportException(type=4, message='TSocket read 0 bytes'); whether or not Cloudera and/or HappyBase is involved.

The root issue (as it turned out) stems from mismatching protocol and/or transport formats on the client-side with what the server-side is implementing, and this can happen with any client/server paring. Mine just happened to be Cloudera and HappyBase, but yours needn't be and you can run into this same issue.

Has anyone recently tried using the happybase v1.1.0 (latest) Python package to interact with Hbase on Cloudera CDH v6.1.x?

I'm trying various options with it, but keep getting the exception:

thriftpy.transport.TTransportException:
TTransportException(type=4, message='TSocket read 0 bytes')

Here is how I start a session and submit a simple call to get a listing of tables (using Python v3.6.7:

import happybase

CDH6_HBASE_THRIFT_VER='0.92'

hbase_cnxn = happybase.Connection(
    host='vps00', port=9090,
    table_prefix=None,
    compat=CDH6_HBASE_THRIFT_VER,
    table_prefix_separator=b'_',
    timeout=None,
    autoconnect=True,
    transport='buffered',
    protocol='binary'
)

print('tables:', hbase_cnxn.tables()) # Exception happens here.

And here is how Cloudera CDH v6.1.x starts the Hbase Thrift server (truncated for brevity):

/usr/java/jdk1.8.0_141-cloudera/bin/java [... snip ... ] \
    org.apache.hadoop.hbase.thrift.ThriftServer start \
    --port 9090 -threadpool --bind 0.0.0.0 --framed --compact

I've tried several variations to options, but getting nowhere.

Has anyone ever got this to work?

EDIT: I next compiled Hbase.thrift (from the Hbase source files -- same HBase version as used by CDH v6.1.x) and used the Python thrift bindings package (in other words, I removed happybase from the equation) and got the same exception.

(._.);

Thank you!

回答1:

After a days worth of working on this, the answer to my question is the following:

import happybase

CDH6_HBASE_THRIFT_VER='0.92'

hbase_cnxn = happybase.Connection(
    host='vps00', port=9090,
    table_prefix=None,
    compat=CDH6_HBASE_THRIFT_VER,
    table_prefix_separator=b'_',
    timeout=None,
    autoconnect=True,
    transport='framed',  # Default: 'buffered'  <---- Changed.
    protocol='compact'   # Default: 'binary'    <---- Changed.
)

print('tables:', hbase_cnxn.tables()) # Works. Output: [b'ns1:mytable', ]

Note that although this Q&A was framed in the context of Cloudera, it turns out (as you'll see) that this was Thrift versions and Thrift Server-Side configurations related, and so it applies to Hortonworks and MapR users, too.

Explanation:

On Cloudera CDH v6.1.x (and probably future versions, too) if you visit the Hbase Thrift Server Configuration section of its management U.I., you'll find -- among many other settings -- these:


Notice that compact protocol and framed transport are both enabled; so they correspondingly needed to be changed in happybase from its defaults (which I show above).

As mentioned in EDIT follow-up to my initial question, I also investigated a pure Thrift (non happybase) solution. And with analogous changes to Python code for that case, I got that to work, too. Here is the code you should use for the pure Thrift solution (taking care to read my commented annotations below):

from thrift.protocol import TCompactProtocol             # Notice the import: TCompactProtocol [!]
from thrift.transport.TTransport import TFramedTransport # Notice the import: TFramedTransport [!]
from thrift.transport import TSocket
from hbase import Hbase
   # -- This hbase module is compiled using the thrift(1) command (version >= 0.10 [!])
   #    and a Hbase.thrift file (obtained from http://archive.apache.org/dist/hbase/
   # -- Also, your "pip freeze | grep '^thrift='" should show a version of >= 0.10 [!]
   #    if you want Python3 support.

(host,port) = ("vps00","9090")
transport = TFramedTransport(TSocket.TSocket(host, port))
protocol  = TCompactProtocol.TCompactProtocol(transport)
client = Hbase.Client(protocol)

transport.open()

# Do stuff here ...
print(client.getTableNames()) # Works. Output: [b'ns1:mytable', ]

transport.close()

I hope this spares people the pain I went through. =:)

CREDITS:

  • Here (MapR) and
  • Here (Blog from China)