Does sqoop preserves order of imported rows as in

2019-08-06 13:45发布

问题:

I am sqooping a table from oracle database to AWS S3 & then creating a hive table over it.

After importing the data, is the order of records present in database preserved in hive table?

I want to fetch few hundred rows from database as well as hive using java JDBC then compare each row present in ResultSet. Assuming I don't have a primary key, can I compare the rows from both ResultSets as they appear(sequentially, using resultSet.next()) or does the order gets changed due to parallel import?

If order isn't preserved whether ORDER BY is a good option?

回答1:

Order is not preserved during import, also order is not determined when selecting without ORDER BY or DISTRIBUTE+SORT due to parallel select processing.

You need to specify order by when selecting data, does not matter how it was inserted.

ORDER BY orders all data, will work on single reducer, DISTRIBUTE BY + SORT orders per reducer and works in distributed mode.

Also see this answer https://stackoverflow.com/a/40264715/2700344