How do we get the 1000 tables description using hi

2019-07-28 22:37发布

问题:

I have 1000 tables, need to check the describe <table name>; for one by one. Instead of running one by one, can you please give me one command to fetch "N" number of tables in a single shot.

回答1:

You can make a shell script and call it with a parameter. For example following script receives schema, prepares list of tables in the schema, calls DESCRIBE EXTENDED command, extracts location, prints table location for first 1000 tables in the schema ordered by name. You can modify and use it as a single command:

#!/bin/bash

#Create table list for a schema (script parameter)
HIVE_SCHEMA=$1
echo Processing Hive schema $HIVE_SCHEMA...
tablelist=tables_$HIVE_SCHEMA

 hive -e " set hive.cli.print.header=false; use $HIVE_SCHEMA; show tables;" 1>  $tablelist

#number of tables
tableNum_limit=1000

#For each table do:
for table in $(cat $tablelist|sort|head -n "$tableNum_limit") #add proper sorting
 do 

 echo Processing table $table ...

     #Call DESCRIBE
     out=$(hive client -S -e "use $HIVE_SCHEMA; DESCRIBE EXTENDED $table")

     #Get location for example
     table_location=$(echo "${out}" | egrep -o 'location:[^,]+' | sed 's/location://')
     echo Table location: $table_location
     #Do something else here

done 


回答2:

Query the metastore

Demo

Hive

create database my_db_1;
create database my_db_2;
create database my_db_3;

create table my_db_1.my_tbl_1 (i int);
create table my_db_2.my_tbl_2 (c1 string,c2 date,c3 decimal(12,2));
create table my_db_3.my_tbl_3 (x array<int>,y struct<i:int,j:int,k:int>);

MySQL (Metastore)

use metastore
;

select      d.name              as db_name
           ,t.tbl_name      
           ,c.integer_idx + 1   as col_position
           ,c.column_name
           ,c.type_name


from                DBS         as d

            join    TBLS        as t

            on      t.db_id =
                    d.db_id

            join    SDS         as s

            on      s.sd_id =
                    t.sd_id        

            join    COLUMNS_V2  as c

            on      c.cd_id =
                    s.cd_id

where       d.name like 'my\_db\_%'

order by    d.name         
           ,t.tbl_name  
           ,c.integer_idx
;

+---------+----------+--------------+-------------+---------------------------+
| db_name | tbl_name | col_position | column_name |         type_name         |
+---------+----------+--------------+-------------+---------------------------+
| my_db_1 | my_tbl_1 |            1 | i           | int                       |
| my_db_2 | my_tbl_2 |            1 | c1          | string                    |
| my_db_2 | my_tbl_2 |            2 | c2          | date                      |
| my_db_2 | my_tbl_2 |            3 | c3          | decimal(12,2)             |
| my_db_3 | my_tbl_3 |            1 | x           | array<int>                |
| my_db_3 | my_tbl_3 |            2 | y           | struct<i:int,j:int,k:int> |
+---------+----------+--------------+-------------+---------------------------+