Does sqoop preserves order of imported rows as in

2019-08-06 13:48发布

I am sqooping a table from oracle database to AWS S3 & then creating a hive table over it.

After importing the data, is the order of records present in database preserved in hive table?

I want to fetch few hundred rows from database as well as hive using java JDBC then compare each row present in ResultSet. Assuming I don't have a primary key, can I compare the rows from both ResultSets as they appear(sequentially, using resultSet.next()) or does the order gets changed due to parallel import?

If order isn't preserved whether ORDER BY is a good option?

标签： jdbc hive sql-order-by sqoop

1条回答

老娘就宠你

2楼-- · 2019-08-06 14:09

Order is not preserved during import, also order is not determined when selecting without ORDER BY or DISTRIBUTE+SORT due to parallel select processing.

You need to specify order by when selecting data, does not matter how it was inserted.

ORDER BY orders all data, will work on single reducer, DISTRIBUTE BY + SORT orders per reducer and works in distributed mode.

Also see this answer https://stackoverflow.com/a/40264715/2700344

0人赞添加讨论(0) 举报

Does sqoop preserves order of imported rows as in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间