I am trying to read a HBase table using Spark Scala API.
Sample Code:
conf.set("hbase.master", "localhost:60000")
conf.set("hbase.zookeeper.quorum", "localhost")
conf.set(TableInputFormat.INPUT_TABLE, tableName)
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])
println("Number of Records found : " + hBaseRDD.count())
How to add where
clause if i use newAPIHadoopRDD
Or we need to use any Spark Hbase Connector
to achieve this?
I saw the below Spark Hbase connector, but i don't see any example code with where clause.
You can use SHC connector from HortonWorks to achieve this.
Here is a code example with Spark 2.
val catalog =
|"table":{"namespace":"default", "name":"my_table"},
|"id":{"cf":"rowkey", "col":"id", "type":"string"},
|"name":{"cf":"info", "col":"name", "type":"string"},
|"age":{"cf":"info", "col":"age", "type":"string"}
val spark = SparkSession
.appName("hbase spark")
val df = spark
HBaseTableCatalog.tableCatalog -> catalog
You can then use whatever method on your dataframe. Ex :
df.where(df("age") === 20)