This is a follow up question to this question where I ask what the Hiveserver 2 thrift java client API is. This question should be able to stand along without that background if you don't need any more context.
Unable to find any documentation on how to use the hiverserver2 thrift api, I put this together. The best reference I could find was the Apache JDBC implementation.
TSocket transport = new TSocket("hive.example.com", 10002);
transport.setTimeout(999999999);
TBinaryProtocol protocol = new TBinaryProtocol(transport);
TCLIService.Client client = new TCLIService.Client(protocol);
transport.open();
TOpenSessionReq openReq = new TOpenSessionReq();
TOpenSessionResp openResp = client.OpenSession(openReq);
TSessionHandle sessHandle = openResp.getSessionHandle();
TExecuteStatementReq execReq = new TExecuteStatementReq(sessHandle, "SHOW TABLES");
TExecuteStatementResp execResp = client.ExecuteStatement(execReq);
TOperationHandle stmtHandle = execResp.getOperationHandle();
TFetchResultsReq fetchReq = new TFetchResultsReq(stmtHandle, TFetchOrientation.FETCH_FIRST, 1);
TFetchResultsResp resultsResp = client.FetchResults(fetchReq);
TRowSet resultsSet = resultsResp.getResults();
List<TRow> resultRows = resultsSet.getRows();
for(TRow resultRow : resultRows){
resultRow.toString();
}
TCloseOperationReq closeReq = new TCloseOperationReq();
closeReq.setOperationHandle(stmtHandle);
client.CloseOperation(closeReq);
TCloseSessionReq closeConnectionReq = new TCloseSessionReq(sessHandle);
client.CloseSession(closeConnectionReq);
transport.close();
I run this code against a Hiverserver2 instance created with
export HIVE_SERVER2_THRIFT_PORT=10002;hive --service hiveserver2
When debugging, I never get past the line
TOpenSessionResp openResp = client.OpenSession(openReq);
The client simply hangs until the timeout is reached and the server doesn't write anything to stdout or the logs. Using Wireshark, I can see the TCP segment for OpenSession() is sent and ACK'd. Once I kill the client or the timeout is reached, the server gives me the following:
13/03/14 11:15:33 ERROR server.TThreadPoolServer: Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 10 more
I find it interesting that this is the exact same error I was receiving when I was mistakenly attempting to use a hiveserver (1) client against hiveserver2, which suggests that as far as hiverserver2 is concerned, my client is sending it garbage.
I see three possibilities for where I might be going wrong.
1) My use of the client API is wrong. I saw that in the JDBC implementation there was some stuff going on with authentication and connection parameters which I'm not using in my example code. I played around with that, but I was shooting in the dark and didn't get any further.
2) I got some setup step wrong. I wasn't able to find TCLIService in the hive-servive-0.10.0 jar, but I was able to find it in the hive-servive-0.10.0.21 jar released by Hortonworks in HDP 1.2, so maybe digging around with that will reveal the issue. Or maybe there is something I need configure server side which explains why I can connect to hive using the ODBC but not with my thrift client.
3) It could be that at this point it is impossible to write against the hiveserver2 client api. This is plausible based on the lack of documentation and the apparent lack of successful examples on the internet, but the JDBC seems to do it. I find this the most unlikely option.
Even if you don't know a fix, knowing if the fix falls under 1, 2, or 3 would help narrow my search.
Not sure if you're still experiencing this issue, but since i've confronted the same problem and resolved it (perhaps bypassed is more accurate description), i'll post a solution here just in case someone else needs it.
This is because the thrift server is expecting to authenticate via SASL when you open your transport connection. Hive Server 2 defaults to using SASL - unfortunately, PHP lacks a version of TSaslClientTransport (which is used as a wrapper around another TTransport object) which handles the SASL negotiation when you open your transport connection.
The easiest solution for now is to set the following property in your hive-site.xml