I have HBase table with row keys, which consist of text ID and timestamp, like next:
...
string_id1.1470913344067
string_id1.1470913345067
string_id2.1470913344067
string_id2.1470913345067
...
How can I filter Scan of HBase (in Scala or Java) to get results with some string ID and timestamp more than some value?
Thanks
I resolve my problem by using to filters:
-
PrefixFilter
(I put to this filter first part of row key. In my case - string ID, for example "string_id1.")-
RowFilter
(I put there two parametres: first -CompareOp.GREATER_OR_EQUAL
, second - all my row key with necessary timestamp, for example "string_id1.1470913345000"In result I get all cells with row key, which has necessary
string_id
if first part, and with timestamp more or equal than I put in filter in second part. It is exactly what I want.Code snippet:
Thanks to everyone who helped to find a solution.
Lets say you somehow ended up having your lines in a monadic traversable structure like List or RDD. Now, you want to have only the strings with
id = "string_id2"
andtimestamp > 1470913345000
.Now what is the problem here ? Just filter you traversable monadic structure on these two criteria.
Fuzzy row approach is efficient for this kind of requirement and when data is is huge : As explained by this article FuzzyRowFilter takes as parameters row key and a mask info.
In example above, in case we want to find last logged in users and row key format is
userId_actionId_timestamp
(whereuserId
has fixed length of say 4 chars), the fuzzy row key we are looking for is????_login_
. This translates into the following params for FuzzyRowKey:Would suggest to go through hbase-the-definitive guide -->Client API: Advanced Features