Apache Phoenix illegal data exception

2019-07-22 03:46发布

问题:

I am having problems writing data from HBase and reading it with Phoenix. These are the steps to reproduce the problem:

Create a table using Phoenix.

 CREATE TABLE test (
    id varchar not null,
    t1.a unsigned_int,
    t1.b varchar
    CONSTRAINT pk PRIMARY KEY (id))
 COLUMN_ENCODED_BYTES = 0;

If I add information to the table using Phoenix using Upsert

upsert into test (id, t1.a, t1.b) values ('a1',1,'foo_a');

And I try query the table, I get this:

select * from test;

+-----+----+--------+
| ID  | A  |   B    |
+-----+----+--------+
| a1  | 1  | foo_a  |
+-----+----+--------+

At this point everything work as expected, but now I am going to add a new entry using HBase directly.

put 'TEST', 'id_1','T1:A', 2
put 'TEST', 'id_1','T1:B','some text'

After that I can't query the table anymore, getting this:

select * from test;
Error: ERROR 201 (22000): Illegal data. Expected length of at least 4 bytes, but had 1 (state=22000,code=201)

I know that the problem is related to how HBase is storing the unsigned_int, and if I remove this column from the table, the queries will work again. How can this problem be solved?

回答1:

The problem seems to be related with how HBase is storing the data, if I make a scan of the table I get this:

ROW                           COLUMN+CELL
 a1                           column=T1:A, timestamp=1551274930165, value=\x00\x00\x00\x01
 a1                           column=T1:B, timestamp=1551274930165, value=foo_a
 a1                           column=T1:_0, timestamp=1551274930165, value=x
 id_1                         column=T1:A, timestamp=1551274993067, value=2
 id_1                         column=T1:B, timestamp=1551275070577, value=some text

That means that the new integer value is being stored as a string, so the right way to store this data should be:

put 'TEST', 'id_1','T1:A', "\x00\x00\x00\x02"

Once this is done the scan will give us this:

ROW                           COLUMN+CELL
 a1                           column=T1:A, timestamp=1551274930165, value=\x00\x00\x00\x01
 a1                           column=T1:B, timestamp=1551274930165, value=foo_a
 a1                           column=T1:_0, timestamp=1551274930165, value=x
 id_1                         column=T1:A, timestamp=1551274993067, value=\x00\x00\x00\x02
 id_1                         column=T1:B, timestamp=1551275070577, value=some text

And data will be accessible from Phoenix without any problem.
Thanks to Boris for the hint.