I am trying to run hql file present in cloud storage using airflow script,there are two parameters through which we can pass the path to DataprocHiveOperator :
- Query: 'gs://bucketpath/filename.q'
Error occuring - cannot recognize input near 'gs' ':' '/'
- query_uri :'gs://bucketpath/filename.q'
Error occuring: PendingDeprecationWarning: Invalid arguments were passed to DataProcHiveOperator. Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were:
*args: ()
**kwargs: {'query_uri': 'gs://poonamp_pcloud/hive_file1.q'
Using Query param , i have successfully run hive queries(select * from table
)
Is there any way to run hql file stored in cloud storage bucket through dataprochiveoperator ?
That's because you are using both query
as well as query_uri
.
If you are querying using a file, you have to use query_uri
and query = None
OR you can ignore writing query.
If you are using inline query
then you have to use query
.
Here is a sample for querying through a file.
HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
gcp_conn_id='google_cloud_default',
queri_ury="gs://us-central1-bucket/data/sample_hql.sql",
cluster_name='cluster-name',
region='us-central1',
dag=dag)
query_uri
is the indeed the correct parameter for running an hql file off of Cloud Storage. However, it was only added to the DataProcHiveOperator
in https://github.com/apache/incubator-airflow/pull/2402. Based on the warning message you got, I don't think you're running on code that supports the parameter. The change is not on the latest release (1.8.2), so you'll need to wait for another release or grab it off the master branch.