How does Hive decide when to use map reduce and wh

As a simple example,

select * from tablename;

DOES NOT kick in map reduce, while

select count(*) from tablename;

DOES. What is the general principle used to decide when to use map reduce (by hive)?

标签： hadoop mapreduce hive

4条回答

Summer. ? 凉城

2楼-- · 2019-02-09 13:54

It is an optimisation technique, hive.fetch.task.conversion property can (FETCH) task minimize latency of mapreduce overhead.

When doing SELECT, LIMIT, FETCH queries this property skips mapreduce and uses the FETCH task.

This property can have 3 values - none, minimal (the default) and more.

0人赞添加讨论(0) 举报

SAY GOODBYE

3楼-- · 2019-02-09 13:57

In general, any sort of aggregation, such as min/max/count is going to require a MapReduce job. This isn't going to explain everything for you, probably.

Hive, in the style of many RDBMS, has an EXPLAIN keyword that will outline how your Hive query gets translated into MapReduce jobs. Try running explain on both your example queries and see what it is trying to do behind the scenes.

0人赞添加讨论(0) 举报

来，给爷笑一个

4楼-- · 2019-02-09 13:57

Whenever we fire a query like select * from tablename, Hive reads the data file and fetches the entire data without doing any aggregation(min/max/count etc.). It'll call a FetchTask rather than a mapreduce task.

This is also an optimization technique in Hive. hive.fetch.task.conversion property can (i.e. FETCH task) minimize latency of map-reduce overhead.

This is like we are reading a hadoop file : hadoop fs -cat filename

But if we use select colNames from tablename, it requires a map-reduce job as it needs to extract the 'column' from each row by parsing it from the file it loads.

0人赞添加讨论(0) 举报

做自己的国王

5楼-- · 2019-02-09 14:11

select * from tablename;

Just reads raw data from files in HDFS, so it is much faster without MapReduce.

0人赞添加讨论(0) 举报

How does Hive decide when to use map reduce and wh

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间