I'm working in an environment where I have an S3 service being used as a data lake, but not AWS Athena. I'm trying to setup Presto to be able to query the data in S3 and I know I need the define the data structure as Hive tables through the Hive Metastore service. I'm deploying each component in Docker, so I'd like to keep the container size as minimal as possible. What components from Hive do I need to be able to just run the Metastore service? I don't really actually care about running Hive, just the Metastore. Can I trim down what's needed, or is there already a pre-configured package just for that? I haven't been able to find anything online that doesn't include downloading all of Hadoop and Hive. Is what I'm trying to do possible?
相关问题
-
hive: cast array
> into map - Find function in HIVE
- Hive Tez reducers are running super slow
- Is there a way to use Facebook Presto 0.131 with C
- Presto - hex string to int
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
- ClassNotFoundException: org.apache.spark.SparkConf
- How to get previous day date in Hive
- Hive's hour() function returns 12 hour clock v
needing to set up hive just for the metastore seems cumbersome indeed. Have you considered using the AWS glue data catalog instead? This way you won’t have to manage anything. You can find detailed informations here: https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html
It's now available standalone
/hive-standalone-metastore-3.0.0/
in the Apache Hive distribution.Link to Docs
There is a workaround, that you do not need hive to run presto. However I haven't tried that with any distributed file system like s3, but code suggest it should work (at least with HDFS). In my opinion it is worth trying, because you do not need any new docker image for hive at all.
The idea is to use a builtin FileHiveMetastore. It is neither documented nor advised to be used in production but you could play with it. Schema information is stored next to the data in the file system. Obviously, it has its prons and cons. I do not know the details of your use case, so I don't know if it fits your needs.
Configuration:
Demo:
After the above I was able to find the following on my machine: