How to add ; at the end of each “show create table

2019-08-26 05:14发布

问题:

So, I have a txt file that has the below content in it:

CREATE EXTERNAL TABLE `table1`(
  `tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE 
  *
STORED AS INPUTFORMAT 
* 
OUTPUTFORMAT 
  *
LOCATION
  *
TBLPROPERTIES (
  'transient_lastDdlTime'='1556u3ehw27')
CREATE TABLE `table2`(
  `count` bigint)
ROW FORMAT SERDE 
 * 
STORED AS INPUTFORMAT 
  *
OUTPUTFORMAT 
  *
LOCATION
  'hdfs://path/'
TBLPROPERTIES (
  'transient'='15407')

As you can see, after each table's DDL, there is no ; at the end of it.I am trying to write a program that inserts a ; after each one of the tables DDL's. So the output should be like:

CREATE EXTERNAL TABLE `table1`(
  `tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE 
  *
STORED AS INPUTFORMAT 
* 
OUTPUTFORMAT 
  *
LOCATION
  *
TBLPROPERTIES (
  'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
      `count` bigint)
    ROW FORMAT SERDE 
     * 
    STORED AS INPUTFORMAT 
      *
    OUTPUTFORMAT 
      *
    LOCATION
      'hdfs://path/'
    TBLPROPERTIES (
      'transient'='15407');

There are two ways I have tried this. (1) By adding it the the DDL creation script and a python program.

Below is my DDL creation .sh script which runs through my database's tables and generates a single file for all the tables within a database.I tried to use the cat function shown below on the last line(# cat...) to do that but kept receiving errors.

 hiveDBName=my_db;

showcreate="show create table "

 showpartitions="show partitions "

 terminate=";"

 tables=`hive -e "use $hiveDBName;show tables;"`

 tab_list=`echo "${tables}"`

 rm -f ${hiveDBName}_all_table_partition_DDL.sql

 for list in $tab_list
 do
        echo "Generating table script for " ${hiveDBName}.${list}
        showcreatetable=${showcreatetable}${showcreate}${hiveDBName}.${list}${terminate}

 done

 echo " ====== Create Tables ======= : " $showcreatetable

##Remove the file
      rm -f ${hiveDBName}_extract_all_tables.txt

      hive -e "use $hiveDBName; ${showcreatetable}" > /home/path/filter_ddls/aa.sql
      grep -v "WARN" /home/path/filter_ddls/aa.sql >/home/path/hive_db_ddls/${hiveDBName}_extract_all_tables.sql
     # cat a1.sql + ";\n\n" >> ${hiveDBName}_extract_all_tables.sql

Below is my Python program but the output of this method adds ; only after tblproperties which skips certain tables.

import re
f = open("/home/path/ddl.sql", 'rt', encoding='latin-1').read()

with open("/home/path/new_ddl.sql","w") as output:
    output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))

Any ideas or suggestion to make this happen? Preferable the first option (the .sh script).

回答1:

Following our discussion in chat, the two solutions discussed were as follows:

If your format is consistent and transient is always present in the line where a closing ';' is required at the end, then a simple sed substitution is all that is needed, e.g.

sed '/transient/s/$/;/' file

(add the -i option to edit the file in-place, and/or add -i.bak to edit in place preserving the original file unchanged with the .bak extension)

If on the other hand, the content can change and transient may or may not be present, then you can key off the TBLPROPERTIES tag and then scan forward in the file to find the first closing ')' following TBLPROPERTIES and add the closing ';' there.

awk provides a more robust solution there as no guarantees have been given as to the possible number of lines between TBLPROPERTIES and the closing ')'. Below awk is used with a simple variable look to server as a flag indicating whether you are looking for the closing ')' after a TBLPROPERTIES (look=1), or not (look=0).

For example:

awk -v look=0 '
    /^TBLPROPERTIES/ { look=1 }
    look == 1 {
        if ( sub (/[)]$/,");") )
            look=0
    }1
' file

GNU awk has gawk -i inplace extension to allow editing the file in-place, similar to sed, otherwise you simply redirect the output to a temp file and then copy or move to the original filename.

Whether using sed or awk above, the output has the desired terminating ';', e.g.

CREATE EXTERNAL TABLE `table1`(
  `tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
  *
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
  *
LOCATION
  *
TBLPROPERTIES (
  'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
  `count` bigint)
ROW FORMAT SERDE
 *
STORED AS INPUTFORMAT
  *
OUTPUTFORMAT
  *
LOCATION
  'hdfs://path/'
TBLPROPERTIES (
  'transient'='15407');

Let me know if you have further questions.