Simple Hive query is empty

2019-08-31 01:49发布

问题:

I have a csv log file. After loading it into Hive using this sentence:

CREATE EXTERNAL TABLE iprange(id STRING, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\,' STORED AS TEXTFILE LOCATION '/user/hadoop/expandediprange/';

I want to perfom a simple query like:

select * from iprange where ip="0.0.0.2";

But I get an empty result.

I'm running Hive on HDFS, should I use HBase? My conclusion is that it's got something to do with the table size. Log file is 160 MB, and the generated table in Hive has 8 million rows. If I try to create myself a smaller file and load it to Hive it will work.

Any idea of what is wrong?

Edit: I forgot to say that it's running on Amazon Elastic MapReduce using a small instance.

回答1:

I found the problem. It was not a Hive issue really. I'm using the output of a Hadoop job as input, and in that job I was writing the output in the key, leaving the value as an empty string:

context.write(new Text(id + "," + ip), new Text(""));

The problem is that Hadoop inserts a tab character by default between the key and the value, and as field is a string it took the tab as well, so I had a trailing tab in every line. I discovered it using Pig as it embraces the output with ().

The solution for me is to set the separator to another character, as I have only two fields I write one in the key and the other one in the value, and set the separator to ",":

conf.set("mapred.textoutputformat.separator", ",");

Maybe its possible to trim these things in Hive.



标签: hadoop hive