I am testing my mapper with MRUnit. I am passing key and list of values as input to the mapper from the test class. The problem is :
String key=1234_abc;
ArrayList<KeyValue> list = new ArrayList<KeyValue>();
KeyValue k1 = new KeyValue(Bytes.toBytes(key),"cf".getBytes(), "Val1".getBytes(),Bytes.toBytes("abc.com"));
KeyValue k2 = new KeyValue(Bytes.toBytes(key), "cf".getBytes(), "Val2".getBytes(),Bytes.toBytes("165"));
Result result = new Result(list);
mapDriver.withInput(key, result);
The problem is while in the result object only the first keyvalue is retained. The others are getting stored as null.
The problem is HBase stores columns in a lexicographic order. It looks like the Result(KeyValue[] kvs) or Result(List kvs) constructor expects in the same order.
Here is the solution!
Hope this will help!
I just finished about 6 hours of pain on this issue myself and finally discovered the problem. It appears to be a bug in the org.apache.hadoop.hbase.client.Result class, at least for the version of HBase I am using (0.94.18).
result.getValue() calls getColumnLatest() which contains a call to binarySearch(). The binarySearch() method seems to be faulty and returns the wrong index almost always. getColumnLatest() doublechecks that it really did find the right KeyValue by making sure the family and qualifier were a match. They usually are not a matchand it returns null.
I ended up re-implementing the getValue() method and the 3 methods it uses and then swap over to the functionally correct implementation in my unit test. There may be a better way to achieve this, but it is late and this is what I came up with (and it does fix the problem):