Why does Hive generate mapreduce job on select column from tablename Vs not generating mapreduce for select * from tablename?
相关问题
-
hive: cast array
> into map - Find function in HIVE
- Hive Tez reducers are running super slow
- Set parquet snappy output file size is hive?
- Hive 'cannot alter table' error
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
- ClassNotFoundException: org.apache.spark.SparkConf
- How to get previous day date in Hive
- Hive's hour() function returns 12 hour clock v
Whenever you run a normal 'select *', a fetch task is created rather than a mapreduce task which just dumps the data as it is without doing anything on it. Whereas whenever you do a 'select column', a map job internally picks that particular column and gives the output.
There was also a bug filed for this to make 'select column' query run without mapreduce. Check the details here: https://issues.apache.org/jira/browse/HIVE-887
When a simple statement like this is executed
select * from tablename
, what hive does is simply to fetch the data from the file stored in hdfs and bring it out in a columnar output format. Basically it generates a statement likeOr in whichever format your table's file is stored.
If you try selecting a column or adding a where clause to the query or using any aggregate on the table, MR comes into picture for obvious reasons.