Getting unordered output while using Hive 'Ord

2019-08-11 14:35发布

问题:

I have tried using the 'Order by' as follows:

select fieldA,fieldB,fieldC,fieldD from testfilter where fieldA ='000009000002'order by fieldA,fieldB,fieldC,fieldD;

But, this results in a unordered output like this:

000009000002    7212    023     653
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     24
000009000002    9999    058     25
000009000002    9999    058     299
000009000002    9999    058     399
000009000002    9999    058     99
000009000002    9999    058     99
000009000002    9999    058     99
000009000002    9999    058     99

What could be the reason?

Note: I'm using Hive 0.10.

回答1:

If I am not wrong, seems like you kept the data types of all the columns as String.

So ordering of data is happening but as it happens for strings i.e lexicographically.

000009000002    7212    023     653

//one group

000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     159
000009000002    9999    058     24
000009000002    9999    058     25
000009000002    9999    058     299
000009000002    9999    058     399
000009000002    9999    058     99
000009000002    9999    058     99
000009000002    9999    058     99
000009000002    9999    058     99

//second group

Here the first three columns are same value, but the last column is organized lexicographically.

As 1 is smaller than 2, so 159 comes first and 24 comes next and so on..

So you can modify the data types to int

Hope it helps...!!



标签: hadoop hive