Skewed tables in Hive

I am learning hive and came across skewed tables. Help me understanding it.

What are skewed tables in Hive?

How do we create skewed tables?

How does it effect performance?

标签： hadoop hive bigdata

2条回答

做自己的国王

2楼-- · 2019-03-28 13:33

In Skewed Tables, partition will be created for the column value which has many records and rest of the data will be moved to another partition. Hence number of partitions, number of mappers and number of intermediate files will be reduced. For ex: out of 100 patients, 90 patients have high BP and other 10 patients have fever, cold, cancer etc. So one partition will be created for 90 patients and one partition will be created for other 10 patients. I hope this will answer your question.

0人赞添加讨论(0) 举报

Fickle 薄情

3楼-- · 2019-03-28 13:49

What are skewed tables in Hive?

A skewed table is a special type of table where the values that appear very often (heavy skew) are split out into separate files and rest of the values go to some other file..

How do we create skewed tables?

create table <T> (schema) skewed by (keys) on ('value1', 'value2') [STORED as DIRECTORIES];

Example :

create table T (c1 string, c2 string) skewed by (c1) on ('x1')

How does it affect performance?

By specifying the skewed values Hive will split those out into separate files automatically and take this fact into account during queries so that it can skip (or include) whole files if possible thus enhancing the performance.

EDIT :

x1 is actually the value on which column c1 is skewed. You can have multiple such values for multiple columns. For example,

create table T (c1 string, c2 string) skewed by (c1) on ('x1', 'x2', 'x3')

Advantage of having such a setup is that for the values that appear more frequently than other values get split out into separate files(or separate directories if we are using STORED AS DIRECTORIES clause). And this information is used by the execution engine during query execution to make processing more efficient.

0人赞添加讨论(0) 举报

Skewed tables in Hive

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间