HIVE : Why does Hive generate mapreduce job on sel

2019-08-28 22:39发布

Why does Hive generate mapreduce job on select column from tablename Vs not generating mapreduce for select * from tablename?

标签： hive

2条回答

等我变得足够好

2楼-- · 2019-08-28 23:18

Whenever you run a normal 'select *', a fetch task is created rather than a mapreduce task which just dumps the data as it is without doing anything on it. Whereas whenever you do a 'select column', a map job internally picks that particular column and gives the output.

There was also a bug filed for this to make 'select column' query run without mapreduce. Check the details here: https://issues.apache.org/jira/browse/HIVE-887

0人赞添加讨论(0) 举报

Anthone

3楼-- · 2019-08-28 23:28

When a simple statement like this is executed select * from tablename, what hive does is simply to fetch the data from the file stored in hdfs and bring it out in a columnar output format. Basically it generates a statement like

hadoop fs -cat hdfs://schemaname/tablename.txt
hadoop fs -cat hdfs://schemaname/tablename.rc
hadoop fs -cat hdfs://schemaname/tablename.orc

Or in whichever format your table's file is stored.

If you try selecting a column or adding a where clause to the query or using any aggregate on the table, MR comes into picture for obvious reasons.

0人赞添加讨论(0) 举报

HIVE : Why does Hive generate mapreduce job on sel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间