How to get Yarn Application Id for hive jdbc conne

2019-08-08 17:01发布

问题:

Here is how i am running queries through hive jdbc

Class.forName(DRIVER);
Connection = DriverManager.getConnection(CONNECTION_URL, USERNAME, PASSWORD);
Response = Connection.createStatement();
ResultSet = Response.executeQuery(query);

I can see the application details in yarn ui. But now i want to get the application id for this job through java code, Is it possible to do so? If yes, then how?

回答1:

AFAIK the short answer is: not in older versions of Hive; possibly with recent versions, which let you retrieve some logs, which may contain the YARN ID.

Starting with Hive 0.14 you can set up HiveServer2 to publish the execution logs for the current Statement; and in your client code you can use a Hive-specific API to fetch these logs (asynchronously just like Beeline client does, or just once when execution is over).

Quoting Hive documentation

Starting with Hive 0.14.0, HiveServer2 operation logs are available for Beeline clients. These parameters configure logging:

hive.server2.logging.operation.enabled
hive.server2.logging.operation.log.location
hive.server2.logging.operation.verbose (Hive 0.14 to 1.1)
hive.server2.logging.operation.level (Hive 1.2 onward)

Hive 2.0 adds the support of logging queryId and sessionId to HiveServer2 log file (...)

The source code for HiveStatement shows several non-JDBC methods such as getQueryLog and hasMoreLogs -- also getYarnATSGuid for Hive 2+ and other stuff for Hive 3+.
Here is the link to the "master" branch on GitHub, switch to whichever version you are using (possibly an old 1.2 for compatibility with Spark).

For a dummy demo about how to tap the "Logs" methods, have a look at that SO post with a snippet.



标签: jdbc hive yarn