Dataproc hive operator not running hql file stored

2019-02-24 08:21发布

I am trying to run hql file present in cloud storage using airflow script,there are two parameters through which we can pass the path to DataprocHiveOperator :

  1. Query: 'gs://bucketpath/filename.q'

Error occuring - cannot recognize input near 'gs' ':' '/'

  1. query_uri :'gs://bucketpath/filename.q'

Error occuring: PendingDeprecationWarning: Invalid arguments were passed to DataProcHiveOperator. Support for passing such arguments will be dropped in Airflow 2.0. Invalid arguments were: *args: () **kwargs: {'query_uri': 'gs://poonamp_pcloud/hive_file1.q'

Using Query param , i have successfully run hive queries(select * from table)

Is there any way to run hql file stored in cloud storage bucket through dataprochiveoperator ?

2条回答
祖国的老花朵
2楼-- · 2019-02-24 08:56

That's because you are using both query as well as query_uri.

If you are querying using a file, you have to use query_uri and query = None OR you can ignore writing query.

If you are using inline query then you have to use query.

Here is a sample for querying through a file.

HiveInsertingTable = DataProcHiveOperator(task_id='HiveInsertingTable',
    gcp_conn_id='google_cloud_default', 
    queri_ury="gs://us-central1-bucket/data/sample_hql.sql",
    cluster_name='cluster-name',
    region='us-central1',
    dag=dag)
查看更多
放我归山
3楼-- · 2019-02-24 09:22

query_uri is the indeed the correct parameter for running an hql file off of Cloud Storage. However, it was only added to the DataProcHiveOperator in https://github.com/apache/incubator-airflow/pull/2402. Based on the warning message you got, I don't think you're running on code that supports the parameter. The change is not on the latest release (1.8.2), so you'll need to wait for another release or grab it off the master branch.

查看更多
登录 后发表回答