I have two Hive scripts which look like this:
Script A:
SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=non-strict; SET hive.exec.parallel=true; ... do something ...
Script B:
SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=non-strict; SET hive.exec.parallel=true; ... do something else ...
The options that we set at the beginning of each script are the same. Is it possible somehow to extract them out to a common place (for example, into a commonoptions.sql) so that our scripts look like this:
Script A:
<run commonoptions.sql> ... do something ...
Script B:
<run commonoptions.sql> ... do something else ...
Ideally I would like to extract out table defintions as well, so that I have:
Script A:
<run commonoptions.sql> <run defineExternalTableXYZ.sql> ... do something with Table XYZ ...
Script B:
<run commonoptions.sql> <run defineExternalTableXYZ.sql> ... do something else with Table XYZ ...
That way I can manage the TableXYZ definition at a single spot. I am not using the Hive CLI. I am using Amazon EMR with Hive Steps.
You should be able to use
hive -i config.hql -f script_A.hql
, whereconfig.hql
would contain your dynamic settings. The-i
flag allows you to pass an initialization script that will be executed before the actual hive file passed to-f
. I'm not super familiar with how AWS kicks off hive jobs in steps, but presumably you edit the submission arguments.You can store these configuration parameters in common file and load in each of your scripts using
source
command:Also you can generate this file for each workflow from the database.