I have a csv log file. After loading it into Hive using this sentence:
CREATE EXTERNAL TABLE iprange(id STRING, ip STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\,' STORED AS TEXTFILE LOCATION '/user/hadoop/expandediprange/';
I want to perfom a simple query like:
select * from iprange where ip="0.0.0.2";
But I get an empty result.
I'm running Hive on HDFS, should I use HBase? My conclusion is that it's got something to do with the table size. Log file is 160 MB, and the generated table in Hive has 8 million rows. If I try to create myself a smaller file and load it to Hive it will work.
Any idea of what is wrong?
Edit: I forgot to say that it's running on Amazon Elastic MapReduce using a small instance.
I found the problem. It was not a Hive issue really. I'm using the output of a Hadoop job as input, and in that job I was writing the output in the key, leaving the value as an empty string:
The problem is that Hadoop inserts a tab character by default between the key and the value, and as field is a string it took the tab as well, so I had a trailing tab in every line. I discovered it using Pig as it embraces the output with ().
The solution for me is to set the separator to another character, as I have only two fields I write one in the key and the other one in the value, and set the separator to ",":
Maybe its possible to trim these things in Hive.