I want to rank user id based on one field. For the same value of the field, rank should be same. That data is in Hive table.
e.g.
user value
a 5
b 10
c 5
d 6
Rank
a - 1
c - 1
d - 3
b - 4
How can i do that?
I want to rank user id based on one field. For the same value of the field, rank should be same. That data is in Hive table.
e.g.
user value
a 5
b 10
c 5
d 6
Rank
a - 1
c - 1
d - 3
b - 4
How can i do that?
It is possible to use
rank
window function either with a DataFrame API:or raw SQL:
but it is extremely inefficient.
You can also try to use RDD API but it is not exactly straightforward. First lets convert DataFrame to RDD:
Next we have to compute ranks per partition:
offset for each partition
and the final ranks: