I have created a table in HIVE
CREATE TABLE IF NOT EXISTS daily_firstseen_analysis (
firstSeen STRING,
category STRING,
circle STRING,
specId STRING,
language STRING,
osType STRING,
count INT)
PARTITIONED BY (day STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS orc;
count(*) is not giving me correct result for this table
hive> select count(*) from daily_firstseen_analysis;
OK
75
Time taken: 0.922 seconds, Fetched: 1 row(s)
While the number of rows in this table is 959 rows
hive> select * from daily_firstseen_analysis;
....
Time taken: 0.966 seconds, Fetched: 959 row(s)
it gives data with 959 rows
hive> ANALYZE TABLE daily_firstseen_analysis PARTITION(day) COMPUTE STATISTICS noscan;
Partition logdata.daily_firstseen_analysis{day=20140521} stats: [numFiles=6, numRows=70, totalSize=4433, rawDataSize=37202]
Partition logdata.daily_firstseen_analysis{day=20140525} stats: [numFiles=6, numRows=257, totalSize=4937, rawDataSize=136385]
Partition logdata.daily_firstseen_analysis{day=20140523} stats: [numFiles=6, numRows=211, totalSize=5059, rawDataSize=112140]
Partition logdata.daily_firstseen_analysis{day=20140524} stats: [numFiles=6, numRows=280, totalSize=5257, rawDataSize=148808]
Partition logdata.daily_firstseen_analysis{day=20140522} stats: [numFiles=6, numRows=141, totalSize=4848, rawDataSize=74938]
OK
Time taken: 5.098 seconds
I am using hive with version Hive 0.13.0.2.1.2.0-402
NOTE: I found this issue in count(*) if We are inserting into a table in more than one time. Tables created with single insert do not have this issue
I execute
ANALYZE TABLE ...
at first is OK, but raise error when i try again.so i try:this is explain
I had the same problem, and using ANALYZE fixed it. Running these commands in order should give you the correct count:
i.e. you have to use the analyze command before the count. You have half the answer within your question.
if you have an external table, remove all the files in HDFS, and insert into the table again then select count(*) will be incorrect.