我从下面的链接试图JSON-SERDE http://code.google.com/p/hive-json-serde/wiki/GettingStarted 。
CREATE TABLE my_table (field1 string, field2 int,
field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;
我已经添加了JSON-SERDE水瓶中
ADD JAR /path-to/hive-json-serde.jar;
并加载数据
LOAD DATA LOCAL INPATH '/home/hduser/pradi/Test.json' INTO TABLE my_table;
并成功加载数据。
但是,当查询数据
我从表中只有一个行作为
Test.json包含
{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}
问题出在哪儿? 为什么只有一个行来的4行,而不是当我查询表。 而在包含了所有的4行!
hive> add jar /home/hduser/pradeep/hive-json-serde-0.2.jar;
Added /home/hduser/pradeep/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradeep/hive-json-serde-0.2.jar
hive> CREATE EXTERNAL TABLE my_table (field1 string, field2 int,
> field3 string, field4 double)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
> WITH SERDEPROPERTIES (
> "field1"="$.field1",
> "field2"="$.field2",
> "field3"="$.field3",
> "field4"="$.field4"
> );
OK
Time taken: 0.088 seconds
hive> LOAD DATA LOCAL INPATH '/home/hduser/pradi/test.json' INTO TABLE my_table;
Copying data from file:/home/hduser/pradi/test.json
Copying file: file:/home/hduser/pradi/test.json
Loading data to table default.my_table
OK
Time taken: 0.426 seconds
hive> select * from my_table;
OK
data1 100 more data1 123.001
Time taken: 0.17 seconds
我已经张贴test.json文件的内容。 所以你可以看到,查询得到的只有一条线作为
data1 100 more data1 123.001
我已经改变了JSON文件employee.json包含
{ “名字”: “迈克”, “姓氏”: “Chepesky”, “employeeNumber”:1840192}
和改变表还,但它表示,当我查询表中的空值
hive> add jar /home/hduser/pradi/hive-json-serde-0.2.jar;
Added /home/hduser/pradi/hive-json-serde-0.2.jar to class path
Added resource: /home/hduser/pradi/hive-json-serde-0.2.jar
hive> create EXTERNAL table employees_json (firstName string, lastName string, employeeNumber int )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde';
OK
Time taken: 0.297 seconds
hive> load data local inpath '/home/hduser/pradi/employees.json' into table employees_json;
Copying data from file:/home/hduser/pradi/employees.json
Copying file: file:/home/hduser/pradi/employees.json
Loading data to table default.employees_json
OK
Time taken: 0.293 seconds
hive>select * from employees_json;
OK
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
Time taken: 0.194 seconds