I am trying to pass a list of dates as parameter to my hive query.
#!/bin/bash
echo "Executing the hive query - Get distinct dates"
var=`hive -S -e "select distinct substr(Transaction_date,0,10) from test_dev_db.TransactionUpdateTable;"`
echo $var
echo "Executing the hive query - Get the parition data"
hive -hiveconf paritionvalue=$var -e 'SELECT Product FROM test_dev_db.TransactionMainHistoryTable where tran_date in("${hiveconf:paritionvalue}");'
echo "Hive query - ends"
Output as:
Executing the hive query - Get distinct dates
2009-02-01 2009-04-01
Executing the hive query - Get the parition data
Logging initialized using configuration in file:/hive/conf/hive-log4j.properties
OK
Product1
Product1
Product1
Product1
Product1
Product1
Time taken: 0.523 seconds, Fetched: 6 row(s)
Hive query - ends
It's only taking only first date as input. I would like to pass my dates as ('2009-02-01','2009-04-01') Note:TransactionMainHistoryTable is partitioned on tran_date column with string type.
Collect array of distinct values using
collect_set
and concatenate it with delimiter','
. This will produce list without outer quotes2009-02-01','2009-04-01
and in the second script add outer quotes'
also, or you can add them in the first query. And when executing in inline sql (-e option) you do not need to pass hiveconf variable, direct shell variable substitution will work. Use hiveconf when you are executing script from file (-f option)Working example:
Returns:
OK