Scheduling an ad-hoc query with Hive/Hadoop using

2019-09-01 01:59发布

问题:

Does Oozie support a user scheduling, via a REST API, an ad-hoc Hive query?

We're building a system where a user can search documents in Hadoop, with support for the user (optionally) specifying some attributes of the data to be searched, using Hive to perform the query against Hadoop. Because of this support for optional fields, we don't know ahead of time what the Hive query will look like (in terms of which tables will be used in the Hive query). We have a service where, at run-time, we process the user's query to generate the corresponding Hive query.

We'd like to be able to schedule these queries via Oozie, but I haven't been able to find documentation on how to perform this via Oozie. I assume this is possible. Is there sample Java code available to describe how to perform this operation?

回答1:

Use the Oozie Coordinator to schedule jobs, Apache documentation here and an example here for Oozie Coordinator. Also, take a look at Azkaban (1, 2) for scheduling.