So, I have a txt file that has the below content in it:
CREATE EXTERNAL TABLE `table1`(
`tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
*
TBLPROPERTIES (
'transient_lastDdlTime'='1556u3ehw27')
CREATE TABLE `table2`(
`count` bigint)
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
'hdfs://path/'
TBLPROPERTIES (
'transient'='15407')
As you can see, after each table's DDL, there is no ; at the end of it.I am trying to write a program that inserts a ; after each one of the tables DDL's. So the output should be like:
CREATE EXTERNAL TABLE `table1`(
`tab_id bigint COMMENT 'The unique identifier of thetable')
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
*
TBLPROPERTIES (
'transient_lastDdlTime'='1556u3ehw27');
CREATE TABLE `table2`(
`count` bigint)
ROW FORMAT SERDE
*
STORED AS INPUTFORMAT
*
OUTPUTFORMAT
*
LOCATION
'hdfs://path/'
TBLPROPERTIES (
'transient'='15407');
There are two ways I have tried this. (1) By adding it the the DDL creation script and a python program.
Below is my DDL creation .sh script which runs through my database's tables and generates a single file for all the tables within a database.I tried to use the cat function shown below on the last line(# cat...) to do that but kept receiving errors.
hiveDBName=my_db;
showcreate="show create table "
showpartitions="show partitions "
terminate=";"
tables=`hive -e "use $hiveDBName;show tables;"`
tab_list=`echo "${tables}"`
rm -f ${hiveDBName}_all_table_partition_DDL.sql
for list in $tab_list
do
echo "Generating table script for " ${hiveDBName}.${list}
showcreatetable=${showcreatetable}${showcreate}${hiveDBName}.${list}${terminate}
done
echo " ====== Create Tables ======= : " $showcreatetable
##Remove the file
rm -f ${hiveDBName}_extract_all_tables.txt
hive -e "use $hiveDBName; ${showcreatetable}" > /home/path/filter_ddls/aa.sql
grep -v "WARN" /home/path/filter_ddls/aa.sql >/home/path/hive_db_ddls/${hiveDBName}_extract_all_tables.sql
# cat a1.sql + ";\n\n" >> ${hiveDBName}_extract_all_tables.sql
Below is my Python program but the output of this method adds ; only after tblproperties which skips certain tables.
import re
f = open("/home/path/ddl.sql", 'rt', encoding='latin-1').read()
with open("/home/path/new_ddl.sql","w") as output:
output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))
Any ideas or suggestion to make this happen? Preferable the first option (the .sh script).
Following our discussion in chat, the two solutions discussed were as follows:
If your format is consistent and
transient
is always present in the line where a closing';'
is required at the end, then a simplesed
substitution is all that is needed, e.g.(add the
-i
option to edit the file in-place, and/or add-i.bak
to edit in place preserving the original file unchanged with the.bak
extension)If on the other hand, the content can change and
transient
may or may not be present, then you can key off theTBLPROPERTIES
tag and then scan forward in the file to find the first closing')'
followingTBLPROPERTIES
and add the closing';'
there.awk
provides a more robust solution there as no guarantees have been given as to the possible number of lines betweenTBLPROPERTIES
and the closing')'
. Belowawk
is used with a simple variablelook
to server as a flag indicating whether you are looking for the closing')'
after aTBLPROPERTIES
(look=1
), or not (look=0
).For example:
GNU
awk
hasgawk -i inplace
extension to allow editing the file in-place, similar tosed
, otherwise you simply redirect the output to a temp file and then copy or move to the original filename.Whether using
sed
orawk
above, the output has the desired terminating';'
, e.g.Let me know if you have further questions.