Based on the post, Hive 0.12 - Collect_list, I am trying to locate Java code to implement a UDAF that will accomplish this or similar functionality but without a repeating sequence.
For instance, collect_all()
returns a sequence A, A, A, B, B, A, C, C
I would like to have the sequence A, B, A, C
returned. Sequentially repeated items would be removed.
Does anyone know of a function in Hive 0.12 that will accomplish or has written their own UDAF?
As always, thanks for the help.
if you have soemthing like this
Where index is some rank order value such as an index directly or something like a date. I assume order matters in your situation.
Then Query:
The problem here is that you won't get the last value of C because there is no next value, so add or nextvalue is null and you should have the results.
This should yield [ "A", "B", "A", "C"]
I ran into a similar problem awhile back. I didn't want to have to write a full-on
UDAF
so I just did a combo with brickhouse collect and my ownUDF
. Say you have this datamy
UDF
wasand then my query was
output
As an aside, the built-in
collect_list
will not necessary keep the elements of the list in the order they were grouped in; brickhousecollect
will. hope this helps.