I am sqooping a table from oracle database
to AWS S3
& then creating a hive
table over it.
After importing the data, is the order of records present in database preserved in hive table?
I want to fetch few hundred rows from database as well as hive using java JDBC then compare each row present in ResultSet
. Assuming I don't have a primary key, can I compare the rows from both ResultSets
as they appear(sequentially, using resultSet.next()
) or does the order gets changed due to parallel import?
If order isn't preserved whether ORDER BY
is a good option?
Order is not preserved during import, also order is not determined when selecting without
ORDER BY
orDISTRIBUTE+SORT
due to parallel select processing.You need to specify
order by
when selecting data, does not matter how it was inserted.ORDER BY orders all data, will work on single reducer, DISTRIBUTE BY + SORT orders per reducer and works in distributed mode.
Also see this answer https://stackoverflow.com/a/40264715/2700344