So if there is a table in the database shown as below:
Key2 DateTimeAge
AAA1 XXX XXX XXX
AAA2 XXX XXX XXX
AAA3 XXX XXX XXX
AAA4 XXX XXX XXX
AAA5 XXX XXX XXX
AAA6 XXX XXX XXX
AAA7 XXX XXX XXX
AAA8 XXX XXX XXX
BBB1 XXX XXX XXX
BBB2 XXX XXX XXX
BBB3 XXX XXX XXX
BBB4 XXX XXX XXX
BBB5 XXX XXX XXX
CCC1 XXX XXX XXX
CCC2 XXX XXX XXX
CCC3 XXX XXX XXX
CCC4 XXX XXX XXX
CCC5 XXX XXX XXX
CCC6 XXX XXX XXX
CCC7 XXX XXX XXX
DDD1 XXX XXX XXX
DDD2 XXX XXX XXX
DDD3 XXX XXX XXX
DDD4 XXX XXX XXX
DDD5 XXX XXX XXX
DDD6 XXX XXX XXX
DDD7 XXX XXX XXX
I have a 2nd table, given as
1 AAA
2 DDD
3 CCC
Since AAA,DDD and CCC are in table2, I want to scan all rows corresponding to these 3 values from table1, i.e, I want to get AAA1-AAA8, DDD1 -DDD7 and CCC1-CCC7.
The table1 is a Hadoop database table, and has millions of rows. I only want to scan it for rows which I get from table1.
Could anyone help me with an efficent way of doing this? I have table1 as an RDD, and table2 is in HBase.