HIVE : Why does Hive generate mapreduce job on sel

2019-08-28 22:39发布

Why does Hive generate mapreduce job on select column from tablename Vs not generating mapreduce for select * from tablename?

标签: hive
2条回答
等我变得足够好
2楼-- · 2019-08-28 23:18

Whenever you run a normal 'select *', a fetch task is created rather than a mapreduce task which just dumps the data as it is without doing anything on it. Whereas whenever you do a 'select column', a map job internally picks that particular column and gives the output.

There was also a bug filed for this to make 'select column' query run without mapreduce. Check the details here: https://issues.apache.org/jira/browse/HIVE-887

查看更多
Anthone
3楼-- · 2019-08-28 23:28

When a simple statement like this is executed select * from tablename, what hive does is simply to fetch the data from the file stored in hdfs and bring it out in a columnar output format. Basically it generates a statement like

hadoop fs -cat hdfs://schemaname/tablename.txt
hadoop fs -cat hdfs://schemaname/tablename.rc
hadoop fs -cat hdfs://schemaname/tablename.orc

Or in whichever format your table's file is stored.

If you try selecting a column or adding a where clause to the query or using any aggregate on the table, MR comes into picture for obvious reasons.

查看更多
登录 后发表回答