AWS Athena json_extract query from string field re

2019-07-12 19:47发布

问题:

I have a table in athena with this structure

CREATE EXTERNAL TABLE `json_test`(
  `col0` string , 
  `col1` string , 
  `col2` string , 
  `col3` string , 
  `col4` string , 
  )
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
WITH SERDEPROPERTIES ( 
  'quoteChar'='\"', 
  'separatorChar'='\;') 

A Json String like this is stored in "col4":

{'email': 'test_email@test_email.com', 'name': 'Andrew', 'surname': 'Test Test'}

I´m trying to make a json_extract query:

SELECT json_extract(col4 , '$.email') as email FROM "default"."json_test"

But the query returns empty values.

Any help would be appreciated.

回答1:

The JSON needs to use double quotes (") for enclosing values.

Compare:

presto> SELECT json_extract('{"email": "test_email@test_email.com", "name": "Andrew"}' , '$.email');
            _col0
-----------------------------
 "test_email@test_email.com"

and

presto> SELECT json_extract('{''email'': ''test_email@test_email.com'', ''name'': ''Andrew''}', '$.email');
 _col0
-------
 NULL

(Note: '' inside SQL varchar literal mean single ' in the constructed value, so the literal here is the same format that in the question.)

If your string value is a "JSON with single quotes", you can try to fix it with replace(string, search, replace) → varchar



回答2:

The problem was the single quote char of the json string stored

{'email': 'test_email@test_email.com', 'name': 'Andrew', 'surname': 'Test Test'}

Changing to double quote

{"email": "test_email@test_email.com", "name": "Andrew", "surname": "Test Test"}

Athena Query works properly:

SELECT json_extract(col4 , '$.email') as email FROM "default"."json_test"