Hive - Thread-safe auto-increment sequence number

2019-07-09 02:19发布

问题:

I have a situation where I need to insert records into a particular Hive table.

One of the columns requires to be an auto-incremented sequence number (that has to strictly follow [max.value + 1] rule at any point of time).

Records are inserted into this particular table from many parallel Hive jobs, that are run in batches - daily, weekly, monthly.

Now, I have these questions:

  1. Will org.apache.hadoop.hive.contrib.udf.UDFRowSequence ( http://svn.apache.org/repos/asf/hive/trunk/contrib/src/java/org/apache/hadoop/hive/contrib/udf/UDFRowSequence.java ) be the right choice?

  2. How can I make it thread-safe, since parallel jobs are also involved in inserting the records?

Note: I came across this useful post ( hive auto increment after certain number ) which I continue to watch, but had to raise a fresh one since (1) an answer is already accepted for that question and so may possibly lose attention of the community and (2) my situation includes thread-safe sequence number generation.