可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Is there a Hive query to quickly find table size (i.e. number of rows) without launching a time-consuming MapReduce job? (Which is why I want to avoid COUNT(*).)

I tried DESCRIBE EXTENDED, but that yielded numRows=0 which is obviously not correct.

(Apologies for the newb question. I tried Googling and searching the apache.org documentation without success.)

回答1:

tblproperties will give the size of the table and can be used to grab just that value if needed.

-- gives all properties
show tblproperties yourTableName

-- show just the raw data size
show tblproperties yourTableName("rawDataSize")

回答2:

Here is the quick command

ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];

For Example,If table is partitioned

 hive> ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;

output is

Partition logdata.ops_bc_log{day=20140523} stats: [numFiles=37, numRows=26095186, totalSize=654249957, rawDataSize=58080809507]

Partition logdata.ops_bc_log{day=20140521} stats: [numFiles=30, numRows=21363807, totalSize=564014889, rawDataSize=47556570705]

Partition logdata.ops_bc_log{day=20140524} stats: [numFiles=35, numRows=25210367, totalSize=631424507, rawDataSize=56083164109]

Partition logdata.ops_bc_log{day=20140522} stats: [numFiles=37, numRows=26295075, totalSize=657113440, rawDataSize=58496087068]

Time taken: 5.252 seconds

回答3:

How about using :

    hdfs dfs -du -s -h /path/to/table/name

回答4:

solution, though not quick
if the table is partitioned, we can count the number of partitions and count(number of rows) in each partition.
For example:, if partition by date (mm-dd-yyyy)

select partition_date, count(*) from <table_name> where <partion_column_name> >= '05-14-2018' group by <partion_column_name>

回答5:

Use parquet format to store data of your external/internal table. Then you will get quicker results.

回答6:

It is a good question. the count() will take much time for finding the result. But unfortunately, count() is the only way to do.

There is an alternative way(can't say alternate but better latency than above case) :

Set the property

set hive.exec.mode.local.auto=true;

and run the same command ( select count(*) from tbl ) which gives better latency than prior.

Hive query to quickly find table size (number of r

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

收藏的人(0)

Hive query to quickly find table size (number of r

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮