Access Hive Data Using Python

2020-02-05 07:48发布

I have some data in HDFS,i need to access that data using python,can anyone tell me how data is accessed from hive using python?

标签: python hive
3条回答
仙女界的扛把子
2楼-- · 2020-02-05 08:04

You can use hive library for access hive from python,for that you want to import hive Class from hive import ThriftHive

Below the Example

import sys

from hive import ThriftHive
from hive.ttypes import HiveServerException

from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol

try:
  transport = TSocket.TSocket('localhost', 10000)
  transport = TTransport.TBufferedTransport(transport)
  protocol = TBinaryProtocol.TBinaryProtocol(transport)
  client = ThriftHive.Client(protocol)
  transport.open()
  client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
  client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
  client.execute("SELECT * FROM r")
  while (1):
    row = client.fetchOne()
    if (row == None):
       break
    print row

  client.execute("SELECT * FROM r")
  print client.fetchAll()
  transport.close()
except Thrift.TException, tx:
  print '%s' % (tx.message)
查看更多
成全新的幸福
3楼-- · 2020-02-05 08:17

A much simpler solution if you're on Windows uses pyodbc:

  import pyodbc
  import pandas as pd

  # connect odbc to data source name
  conn = pyodbc.connect("DSN=<your_dsn>", autocommit=True)

  # read data into dataframe
  hive_df = pd.read_sql("SELECT * FROM <table_name>", conn)

As long as you have an ODBC driver and a DSN, that's all you need.

查看更多
forever°为你锁心
4楼-- · 2020-02-05 08:22

To install you'll need these libraries:

pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive

If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev using apt-get or yum or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install)

After installation, you can execute a hive query like this:

from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")

Now that you have the hive connection, you have options how to use it. You can just straight-up query:

cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
  use_result(result)

...or to use the connection to make a Pandas dataframe:

import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)
查看更多
登录 后发表回答