I am trying to load a huge volume json data with nested structure to hive using a Json serde. some of the field names start with $
in nested structure. I am mapping hive filed names Using SerDeproperties
, but how ever when i query the table, getting null in the field starting with $
, tried with different syntax,but no luck.
Sample JSON:
{
"_id" : "319FFE15FF90",
"SomeThing" :
{
"$SomeField" : 22,
"AnotherField" : 2112,
"YetAnotherField": 1
}
. . . etc . . . .
Using a schema as follows:
create table testSample
(
`_id` string,
something struct
<
$somefield:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.somefield" = "$somefield"
);
This schema builds OK, however, somefield(starting with $
) in the above table is always returning null (all the other values exist and are correct).
We've been trying a lot of syntax combinations, but to no avail.
Does anyone know the trick to hap a nested field with a leading $
in its name?
You almost got it right. Try creating the table like this.
The mistake you're making is that when mapping in the serde properties (mapping.somefield ="$somefield") you're saying "when looking for the hive column named 'somefield', look for the json field '$somefield', but in hive you defined the column with the dollar sign, which if not outright illegal it's for sure not the best practice in hive.
create table testSample
(
`_id` string,
something struct
<
somefield:int,
anotherfield:bigint,
yetanotherfield:int
>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
with serdeproperties
(
"mapping.somefield" = "$somefield"
);
I tested it with some test data:
{ "_id" : "123", "something": { "$somefield": 12, "anotherfield":13,"yetanotherfield":100}}
hive> select something.somefield from testSample;
OK
12
I am suddenly starting to see this problem as well but for normal column names as well (no special characters such as $)
I am populating an external table (Temp) from another internal table (Table2) and want the output of Temp table in JSON format. I want column names in camel case in the output JSON file and so am also using the Serdepoperties in the Temp table to specify correct names. However, I am seeing that when I do Select * from the Temp table, it gives NULL values for the columns whose names have been used in the mapping.
I am running Hive 0.13. Here are the commands:
Create table command:
CREATE EXTERNAL TABLE Temp (
data STRUCT<
customerId:BIGINT, region:STRING, marketplaceId:INT, asin:ARRAY<STRING>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'mapping.customerid' = 'customerId',
'mapping.marketplaceid' = 'marketplaceId'
)
LOCATION '/output';
INSERT INTO TABLE Temp
SELECT
named_struct ('customerId',customerId, 'region', region, 'marketplaceId', marketplaceId, 'asin', asin)
FROM Table2;
Select * from Temp:
{"customerid":null,"region":"EU","marketplaceid":null,"asin":["B000FC1PZC"]}
{"customerid":null,"region":"EU","marketplaceid":null,"asin":["B000FC1C9G"]}
See how "customerid" and "marketplaceid" are null. Generated JSON file is:
{"data":{"region":"EU","asin":["B000FC1PZC"]}}
{"data":{"region":"EU","asin":["B000FC1C9G"]}}
Now, if I remove the with serdeproperties, the table starts getting all values:
{"customerid":1,"region":"EU","marketplaceid":4,"asin":["B000FC1PZC"]}
{"customerid":2,"region":"EU","marketplaceid":4,"asin":["B000FC1C9G"]}
And then the JSON file so generated is:
{"data":{"region":"EU","marketplaceid":4,"asin":["B000FC1PZC"],"customerid":1}}
{"data":{"region":"EU","marketplaceid":4,"asin":["B000FC1C9G"],"customerid":2}}