How to access remote hive using pyhive

2019-04-16 08:01发布

问题:

Used this link to try to connect to a remote hive. Below is the code used. The error msg received is also given below

How to Access Hive via Python?

Code

   from pyhive import hive
    conn = hive.Connection(host="10.111.22.11", port=10000, username="user1" ,database="default")

Error msg

Could not connect to any of [('10.111.22.11', 10000)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.6/site-packages/pyhive/hive.py", line 131, in __init__
    self._transport.open()
  File "/opt/anaconda3/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 61, in open
    self._trans.open()
  File "/opt/anaconda3/lib/python3.6/site-packages/thrift/transport/TSocket.py",line 113, in open
    raise TTransportException(TTransportException.NOT_OPEN, msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.111.22.11', 10000)]

What are the other requirements for successful connection? I am able to connect to the server directly (using putty) and run hive. But when tried from another server X i get this error. Also i can ping the hive server from server X.

Could the port number be the problem? How do i check the correct port number?

As discussed in the below answer i tried to start hiveserver2 . But the command doesnt seem to work. Any help is really appreciated.

Also the port i see in the log when i execute a query from hive shell is 8088. wonder if this should be the port instead of 10000(both did not work anyway)

回答1:

HiveServer2 process must be started in your remote Hive host. 10000 is the default port number.

Use this command to start HiveServer2.

$HIVE_HOME/bin/hiveserver2 


回答2:

Could not make it work using pyhive. Had to use paramiko insted below is the sample code

import os
import paramiko
import time 

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_host_keys(os.path.expanduser(os.path.join("~", ".ssh", "known_hosts")))
ssh.connect('1.1.1.1', username='uid', password='pwd')
sshin, sshout, ssherr= ssh.exec_command('hive -e "create table test(key varchar(10),keyval varchar(200))"')


回答3:

Please try below code to access remote hive table using pyhive:

from pyhive import hive
import pandas as pd

#Create Hive connection 
conn = hive.Connection(host="10.111.22.11", port=10000, username="user1")

# Read Hive table and Create pandas dataframe
df = pd.read_sql("SELECT * FROM db_Name.table_Name limit 10", conn)
print(df.head())