Does HBase have any command that works like SQL LIMIT
query?
I can do it by setStart
and setEnd
, but I do not want to iterate all rows.
Does HBase have any command that works like SQL LIMIT
query?
I can do it by setStart
and setEnd
, but I do not want to iterate all rows.
From the HBase shell you can use LIMIT:
hbase> scan 'test-table', {'LIMIT' => 5}
From the Java API you can use Scan.setMaxResultSize(N)
or scan.setMaxResultsPerColumnFamily(N)
.
There is a filter called PageFilter. Its meant for this purpose.
Scan scan = new Scan(Bytes.toBytes("smith-"));
scan.addColumn(Bytes.toBytes("personal"), Bytes.toBytes("givenName"));
scan.addColumn(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"));
scan.setFilter(new PageFilter(25));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
// ...
}
http://java.dzone.com/articles/handling-big-data-hbase-part-4
If one uses HBase Shell, the following command could be used to limit the query results:The "LIMIT" must be enclosed in single quotes.
scan 'table-name', {'LIMIT' => 10}
A guaranteed way is to do the limiting on the client side, inside the iterator loop. This is the approach taken in the HBase Ruby Shell. From table.rb ($HBASE_HOME/hbase-shell/src/main/ruby/hbase/table.rb): Line 467:
# Start the scanner
scanner = @table.getScanner(_hash_to_scan(args))
iter = scanner.iterator
# Iterate results
while iter.hasNext
if limit > 0 && count >= limit
break
end
row = iter.next
...
end
It can be made a bit more efficient by adding scan.setFilter(new PageFilter(limit)) and scan.setCaching(limit), and then table.getScanner(scan). The page filter will ensure that each region server will return at most limit rows, the scan caching limit will ensure that each region server will read ahead and cache at most 'limit' rows, and then the client loop limit checking can break the loop after getting the first 'limit' rows in the order received by the client.