Get row count from all tables in hive

How can I get row count from all tables using hive? I am interested in the database name, table name and row count

标签： hql hive

8条回答

2楼-- · 2020-07-10 08:58

Here's a solution I wrote that uses python:

import os
dictTabCnt={}

print("=====Finding Tables=====")
tableList = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; show tables;\"").read().split('\n')

print("=====Finding Table Counts=====")
for i in tableList:
    if i <> '':
        strTemp = os.popen("hive --outputformat=dsv --showHeader=false -e \"use [YOUR DB HERE]; SELECT COUNT(*) FROM {}\"".format(i)).read()
        dictTabCnt[i] = strTemp

print("=====Table Counts=====")
for table,cnt in dictTabCnt.items():
    print("{}: {}".format(table,cnt))

0人赞添加讨论(0) 举报

一纸荒年 Trace。

3楼-- · 2020-07-10 09:03

You can collect the statistics on the table by using Hive ANALAYZE command. Hive cost based optimizer makes use of these statistics to create optimal execution plan.

Below is the example of computing statistics on Hive tables:

hive> ANALYZE TABLE stud COMPUTE STATISTICS;
 Query ID = impadmin_20171115185549_a73662c3-5332-42c9-bb42-d8ccf21b7221
 Total jobs = 1
 Launching Job 1 out of 1
 …
 Table training_db.stud stats: [numFiles=5, numRows=5, totalSize=50, rawDataSize=45]
 OK
 Time taken: 8.202 seconds

Links: http://dwgeek.com/apache-hive-explain-command-example.html/

0人赞添加讨论(0) 举报

再贱就再见

4楼-- · 2020-07-10 09:07

You will need to do a

select count(*) from table

for all tables.

To automate this, you can make a small bash script and some bash commands. First run

$hive -e 'show tables' | tee tables.txt

This stores all tables in the database in a text file tables.txt

Create a bash file (count_tables.sh) with the following contents.

while read line
do
 echo "$line "
 eval "hive -e 'select count(*) from $line'"
done

Now run the following commands.

$chmod +x count_tables.sh
$./count_tables.sh < tables.txt > counts.txt

This creates a text file(counts.txt) with the counts of all the tables in the database

0人赞添加讨论(0) 举报

够拽才男人

5楼-- · 2020-07-10 09:08

try this guys to automate-- put in shell after that run bash filename.sh

hive -e 'select count(distinct fieldid) from table1 where extracttimestamp<'2018-04-26'' > sample.out

hive -e 'select count(distinct fieldid) from table2 where day='26'' > sample.out

lc=cat sample.out | uniq | wc -l if [ $lc -eq 1 ]; then echo "PASS" else echo "FAIL" fi

0人赞添加讨论(0) 举报

Melony?

6楼-- · 2020-07-10 09:13

select count(*) from table

I think there is no more efficient way.

0人赞添加讨论(0) 举报

看我几分像从前

7楼-- · 2020-07-10 09:18

You can also set the database in the same command and separate with ;.

hive -e 'use myDatabase;show tables'

0人赞添加讨论(0) 举报

1 2 下一页

Get row count from all tables in hive

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间