Scan HTable rows for specific column value using H

2019-01-16 14:16发布

I want to scan rows in a HTable from hbase shell where a column family (i.e., Tweet) has a particular value (i.e., user_id).

Now I want to find all rows where tweet:user_id has value test1 as this column has value 'test1'

column=tweet:user_id, timestamp=1339581201187, value=test1

Though I can scan table for a particular using,

scan 'tweetsTable',{COLUMNS => 'tweet:user_id'}

but I did not find any way to scan a row for a value.

Is it possible to do this via HBase Shell?

I checked this question as well.

标签: nosql hbase
6条回答
祖国的老花朵
2楼-- · 2019-01-16 14:45

It is possible without Hive:

scan 'filemetadata', 
     { COLUMNS => 'colFam:colQualifier', 
       LIMIT => 10, 
       FILTER => "ValueFilter( =, 'binaryprefix:<someValue.e.g. test1 AsDefinedInQuestion>' )" 
     }

Note: in order to find all rows that contain test1 as value as specified in the question, use binaryprefix:test1 in the filter (see this answer for more examples)

查看更多
3楼-- · 2019-01-16 14:45

To scan a table in hbase on the basis of any column value, SingleColumnValueFilter can be used as :

scan 'tablename' ,
   { 
     FILTER => "SingleColumnValueFilter('column_family','col_name',>, 'binary:1')" 
   } 
查看更多
何必那么认真
4楼-- · 2019-01-16 14:53

From HBAse shell i think it is not possible because it is some how like query from which we use want to find spsecific data. As all we know that HBAse is noSQL so when we want to apply query or if we have a case like you then i think you should use Hive or PIG where as Hive is quiet good approach because in PIG we need to mess with scripts.
Anyway you can get good guaidence about hive from here HIVE integration with HBase and from Here
If yout only purpose is to view data not to get from code (of any client) then you can use HBase Explorer or a new and very good product but it is in its beta release is "HBase manager". You can get this from HBase Manager
Its simple, and more importantly, it helps to insert and delete data, applying filters on column qualifiers from UI like other DBclients. Have a try.
I hope it would be helpful for you :)

查看更多
爷、活的狠高调
5楼-- · 2019-01-16 14:54

An example of a text search for a value BIGBLUE in table t1 with column family of d:a_content. A scan of the table will show all the available values :-

scan 't1'
...
column=d:a_content, timestamp=1404399246216, value=BIGBLUE
...

To search just for a value of BIGBLUE with limit of 1, try the below command :-

scan 't1',{ COLUMNS => 'd:a_content', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:BIGBLUE' )" }

COLUMN+CELL
column=d:a_content, timestamp=1404399246216, value=BIGBLUE

Obviously removing the limit will show all occurrences in that table/cf.

查看更多
女痞
6楼-- · 2019-01-16 15:09

As there were multiple requests to explain this answer this additional answer has been posted.

Example 1

If

scan '<table>', { COLUMNS => '<column>', LIMIT => 3 }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

then this filter:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( =, 'binaryprefix:hello_value2') AND ValueFilter( =, 'binaryprefix:hello_value3')" }

would return:

ROW     COLUMN+CELL
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

Example 2

If not is supported as well:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( !=, 'binaryprefix:hello_value2' )" }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3
查看更多
做个烂人
7楼-- · 2019-01-16 15:12

Nishu, here is solution I periodically use. It is actually much more powerful than you need right now but I think you will use it's power some day. Yes, it is for HBase shell.

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'yourTable', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('AAA')), COLUMNS => 'family:field' }

Only family:field column is returned with filter applied. This filter could be improved to perform more complicated comparisons.

Here are also hints for you that I consider most useful:

查看更多
登录 后发表回答