I am trying to understand how sqoop export works.I have a table site in mysql which contains two columns id and url and contains two rows
1,www.yahoo.com
2,www.gmail.com
The table has no primary key
When i am exporting the entries from HDFS to mysql site table by executing below command its inserting duplicate entries
I have below entries in HDFS
1,www.one.com
2,www.2.com
3,www.3.com
4,www.4.com
sqoop export --table site --connect jdbc:mysql://localhost/loudacre -- username training --password training --export-dir /site/ --update-mode allowinsert --update-key id
So instead of updating already existing id its inserting duplicate id again (meaning two 1 , 1 for www.one.com and 1 for www.yahoo.com)
even if I remove the --update-key the outcome is same.Does its happening because the table doesn't have primary key
I am using sqoop 1.4.5 in Cloudera quickstart VM
Any help ?
As per Sqoop docs,
So, either
--update-key
column should be primary key or have unique index on it.Internally, sqoop will create query like this
INSERT INTO table (id,email) VALUES (1,www.one.com) ON DUPLICATE KEY UPDATE email=www.one.com
and so on for all other values.