Use airflow hive operator and output to a text fil

2019-07-13 15:47发布

Hi I want to execute hive query using airflow hive operator and output the result to a file. I don't want to use INSERT OVERWRITE here.

hive_ex = HiveOperator(
    task_id='hive-ex',
    hql='/sql/hive-ex.sql',
    hiveconfs={
        'DAY': '{{ ds }}',
        'YESTERDAY': '{{ yesterday_ds }}',
        'OUTPUT': '{{ file_path }}'+'csv',
    },
    dag=dag
)

What is the best way to do this?

I know how to do this using bash operator,but want to know if we can use hive operator

hive_ex = BashOperator(
    task_id='hive-ex',
    bash_command='hive -f hive.sql -DAY={{ ds }} >> {{ file_path }} 
    /file_{{ds}}.json',
    dag=dag
)

2条回答
ら.Afraid
2楼-- · 2019-07-13 16:23

you need airflow hooks. see Hooks and HiveHook, there's a to_csv method or you can use get_records method and then do it yourself.

查看更多
Evening l夕情丶
3楼-- · 2019-07-13 16:29

Since it is a pretty custom use-case the best way is to extend the Hive operator (or create your own Hive2CSVOperator). The implementation would depend on whether you have access to hive through CLI or HiveServer2.

Hive CLI

I would try first with configuring the Hive CLI connection and adding the hive_cli_params, as per Hive CLI hook code, and if this doesn't work, extend the Hook (which would give you access to everything).

HiveServer2

There is a separate hook for this case (link). It is a bit more convenient because it has a get_results method (source) or to_csv method (source).

The execute in the operator code could look then similar to this:

def execute():
  ...
  self.hook = HiveServer2Hook(...)
  self.conn = self.hook.get_conn()

  self.conn.to_csv(hql=self.hql, csv_filepath=self.output_filepath, ...)
查看更多
登录 后发表回答