How can I get row count from all tables using hive? I am interested in the database name, table name and row count
相关问题
- Hibernate and multiThread Logic
-
hive: cast array
> into map - Find function in HIVE
- Hive Tez reducers are running super slow
- HQL Unexpected AST node: min
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- SQL query Frequency Distribution matrix for produc
- How to left join unrelated entities?
- Cloudera 5.6: Parquet does not support date. See H
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
- ClassNotFoundException: org.apache.spark.SparkConf
- How to get previous day date in Hive
Here's a solution I wrote that uses python:
You can collect the statistics on the table by using Hive ANALAYZE command. Hive cost based optimizer makes use of these statistics to create optimal execution plan.
Below is the example of computing statistics on Hive tables:
Links: http://dwgeek.com/apache-hive-explain-command-example.html/
You will need to do a
for all tables.
To automate this, you can make a small bash script and some bash commands. First run
This stores all tables in the database in a text file tables.txt
Create a bash file (count_tables.sh) with the following contents.
Now run the following commands.
This creates a text file(counts.txt) with the counts of all the tables in the database
try this guys to automate-- put in shell after that run bash filename.sh
hive -e 'select count(distinct fieldid) from table1 where extracttimestamp<'2018-04-26'' > sample.out
hive -e 'select count(distinct fieldid) from table2 where day='26'' > sample.out
lc=
cat sample.out | uniq | wc -l
if [ $lc -eq 1 ]; then echo "PASS" else echo "FAIL" fiselect count(*) from table
I think there is no more efficient way.
You can also set the database in the same command and separate with
;
.