Setup Standalone Hive Metastore Service For Presto

I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive, just the Metastore. Can I trim down what's needed, or is there already a pre-configured package just for that? I haven't been able to find anything online that doesn't include downloading all of Hadoop and Hive. Is what I'm trying to do possible?

标签： hive Presto hive-metastore

3条回答

地球回转人心会变

2楼-- · 2020-05-27 11:53

needing to set up hive just for the metastore seems cumbersome indeed. Have you considered using the AWS glue data catalog instead? This way you won’t have to manage anything. You can find detailed informations here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html

0人赞添加讨论(0) 举报

看我几分像从前

3楼-- · 2020-05-27 12:01

It's now available standalone /hive-standalone-metastore-3.0.0/ in the Apache Hive distribution.

Beginning in Hive 3.0, the Metastore is released as a separate package and can be run without the rest of Hive. This is referred to as standalone mode.

By default the Metastore is configured for use with Hive, so a few configuration parameters have to be changed in this configuration.

metastore.task.threads.always -> org.apache.hadoop.hive.metastore.events.EventCleanerTask,org.apache.hadoop.hive.metastore.MaterializationsCacheCleanerTask
metastore.expression.proxy -> org.apache.hadoop.hive.metastore.DefaultPartitionExpressionProxy

Link to Docs

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2020-05-27 12:14

There is a workaround, that you do not need hive to run presto. However I haven't tried that with any distributed file system like s3, but code suggest it should work (at least with HDFS). In my opinion it is worth trying, because you do not need any new docker image for hive at all.

The idea is to use a builtin FileHiveMetastore. It is neither documented nor advised to be used in production but you could play with it. Schema information is stored next to the data in the file system. Obviously, it has its prons and cons. I do not know the details of your use case, so I don't know if it fits your needs.

Configuration:

connector.name=hive-hadoop2
hive.metastore=file
hive.metastore.catalog.dir=file:///tmp/hive_catalog
hive.metastore.user=cox

Demo:

presto:tiny> create schema hive.default;
CREATE SCHEMA
presto:tiny> use hive.default;
USE
presto:default> create table t (t bigint);
CREATE TABLE
presto:default> show tables;
 Table
-------
 t
(1 row)

Query 20180223_202609_00009_iuchi, FINISHED, 1 node
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 18B] [11 rows/s, 201B/s]

presto:default> insert into t (values 1);
INSERT: 1 row

Query 20180223_202616_00010_iuchi, FINISHED, 1 node
Splits: 51 total, 51 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]

presto:default> select * from t;
 t
---
 1
(1 row)

After the above I was able to find the following on my machine:

/tmp/hive_catalog/
/tmp/hive_catalog/default
/tmp/hive_catalog/default/t
/tmp/hive_catalog/default/t/.prestoPermissions
/tmp/hive_catalog/default/t/.prestoPermissions/user_cox
/tmp/hive_catalog/default/t/.prestoPermissions/.user_cox.crc
/tmp/hive_catalog/default/t/.20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278.crc
/tmp/hive_catalog/default/t/.prestoSchema
/tmp/hive_catalog/default/t/20180223_202616_00010_iuchi_79dee041-58a3-45ce-b86c-9f14e6260278
/tmp/hive_catalog/default/t/..prestoSchema.crc
/tmp/hive_catalog/default/.prestoSchema
/tmp/hive_catalog/default/..prestoSchema.crc

0人赞添加讨论(0) 举报

Setup Standalone Hive Metastore Service For Presto

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间