Empty String is not treated as null in Hive

2019-02-11 04:17发布

问题:

My understanding of the following statement is that if blank or empty string is inserted into hive column, it will be treated as null.

TBLPROPERTIES('serialization.null.format'=''

To test the functionality i have created a table and insertted '' to the filed 3. When i query for nulls on the field3, there are no rows with that criteria.

Is my understanding of making blank string to null correct??

CREATE TABLE CDR
(
field1                 string,
field2                 string,
field3                 string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
**TBLPROPERTIES('serialization.null.format'='');**

insert overwrite table emmtest.cdr select **field1,field2,''** from emmtest.cdr_non_orc;

select * from emmtest.cdr where **field3 is null;**

The last statement has not returned any rows. But i am expecting all rows to be returned since there is blank string in field3.

回答1:

TBLPROPERTIES('serialization.null.format'='') means the following:

  • An empty field in the data files will be treated as NULL when you query the table
  • When inserting rows to the table, NULL values will be written to the data files as empty fields

You are doing something else -
You are inserting an empty string to a table from a query.
It is treated "as is" - an empty string.

Demo

bash

hdfs dfs -mkdir /user/hive/warehouse/mytable
echo Hello,,World | hdfs dfs -put - /user/hive/warehouse/mytable/data.txt

hive

create table mytable (s1 string,s2 string,s3 string) 
row format delimited 
fields terminated by ','
;

hive> select * from mytable;
OK
s1  s2  s3
Hello       World

hive> alter table mytable set tblproperties ('serialization.null.format'='');
OK

hive> select * from mytable;
OK
s1  s2  s3
Hello   NULL    World