Scan HTable rows for specific column value using H

I want to scan rows in a HTable from hbase shell where a column family (i.e., Tweet) has a particular value (i.e., user_id).

Now I want to find all rows where tweet:user_id has value test1 as this column has value 'test1'

column=tweet:user_id, timestamp=1339581201187, value=test1

Though I can scan table for a particular using,

scan 'tweetsTable',{COLUMNS => 'tweet:user_id'}

but I did not find any way to scan a row for a value.

Is it possible to do this via HBase Shell?

I checked this question as well.

标签： nosql hbase

6条回答

祖国的老花朵

2楼-- · 2019-01-16 14:45

It is possible without Hive:

scan 'filemetadata', 
     { COLUMNS => 'colFam:colQualifier', 
       LIMIT => 10, 
       FILTER => "ValueFilter( =, 'binaryprefix:<someValue.e.g. test1 AsDefinedInQuestion>' )" 
     }

Note: in order to find all rows that contain test1 as value as specified in the question, use binaryprefix:test1 in the filter (see this answer for more examples)

0人赞添加讨论(0) 举报

傲

3楼-- · 2019-01-16 14:45

To scan a table in hbase on the basis of any column value, SingleColumnValueFilter can be used as :

scan 'tablename' ,
   { 
     FILTER => "SingleColumnValueFilter('column_family','col_name',>, 'binary:1')" 
   }

0人赞添加讨论(0) 举报

何必那么认真

4楼-- · 2019-01-16 14:53

From HBAse shell i think it is not possible because it is some how like query from which we use want to find spsecific data. As all we know that HBAse is noSQL so when we want to apply query or if we have a case like you then i think you should use Hive or PIG where as Hive is quiet good approach because in PIG we need to mess with scripts.
Anyway you can get good guaidence about hive from here HIVE integration with HBase and from Here
If yout only purpose is to view data not to get from code (of any client) then you can use HBase Explorer or a new and very good product but it is in its beta release is "HBase manager". You can get this from HBase Manager
Its simple, and more importantly, it helps to insert and delete data, applying filters on column qualifiers from UI like other DBclients. Have a try.
I hope it would be helpful for you :)

0人赞添加讨论(0) 举报

爷、活的狠高调

5楼-- · 2019-01-16 14:54

An example of a text search for a value BIGBLUE in table t1 with column family of d:a_content. A scan of the table will show all the available values :-

scan 't1'
...
column=d:a_content, timestamp=1404399246216, value=BIGBLUE
...

To search just for a value of BIGBLUE with limit of 1, try the below command :-

scan 't1',{ COLUMNS => 'd:a_content', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:BIGBLUE' )" }

COLUMN+CELL
column=d:a_content, timestamp=1404399246216, value=BIGBLUE

Obviously removing the limit will show all occurrences in that table/cf.

0人赞添加讨论(0) 举报

女痞

6楼-- · 2019-01-16 15:09

As there were multiple requests to explain this answer this additional answer has been posted.

Example 1

scan '<table>', { COLUMNS => '<column>', LIMIT => 3 }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

then this filter:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( =, 'binaryprefix:hello_value2') AND ValueFilter( =, 'binaryprefix:hello_value3')" }

would return:

ROW     COLUMN+CELL
ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

Example 2

If not is supported as well:

scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( !=, 'binaryprefix:hello_value2' )" }

would return:

ROW     COLUMN+CELL
ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3

0人赞添加讨论(0) 举报

做个烂人

7楼-- · 2019-01-16 15:12

Nishu, here is solution I periodically use. It is actually much more powerful than you need right now but I think you will use it's power some day. Yes, it is for HBase shell.

import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.SubstringComparator
import org.apache.hadoop.hbase.util.Bytes

scan 'yourTable', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('AAA')), COLUMNS => 'family:field' }

Only family:field column is returned with filter applied. This filter could be improved to perform more complicated comparisons.

Here are also hints for you that I consider most useful:

http://hadoop-hbase.blogspot.com/2012/01/hbase-intra-row-scanning.html - Intra-row scanning explanation (Java API).
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/filter/FilterBase.html - JavaDoc for FilterBase class with links to descendants which actually can be used the same style. OK, shell syntax will be slightly different but having example above you can use this.

0人赞添加讨论(0) 举报

Scan HTable rows for specific column value using H

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间