Sqoop import without primary key in RDBMS

2019-02-13 15:39发布

问题:

Can I import RDBMS table data (table doesn't have a primary key) to hive using sqoop? If yes, then can you please give the sqoop import command.

I have tried with sqoop import general command, but it failed.

回答1:

If your table has no primary key defined then you have to give -m 1 option for importing the data or you have to provide --split-by argument with some column name, otherwise it gives the error:  

ERROR tool.ImportTool: Error during import: No primary key could be found for table <table_name>. Please specify one with --split-by or perform a sequential import with '-m 1'

then your sqoop command will look like

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    -m 1

or

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    --split-by created_date


回答2:

You can import data from RDBMS into hive without Primarykey.

First you need to create a table in hive.After that you need to write the following code:

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table <RDBMS-Table-name> \
    --target-dir /user/root/user_data \
    --hive-import \ 
    --hive-table <hive-table-name> \
    --create-hive-table \
    -m 1 (or) --split-by <RDBMS-Column>


回答3:

In the first scenario using 1 Mapper ... If the size of the file is very large this process is going to take more time to respond or might fail. Check the size of the data before using mapper = 1 .



回答4:

Quick view:

The Sqoop job fails and the error looks like this" Error during import: No primary key could be found for the table . Please specify one with --split-by or perform a sequential import with '-m 1' "

Description: Usually, when you perform a Sqoop job internally it searches for the primary key in the table. If there is no primary key the Sqoop job fails and the error looks like this" Error during import: No primary key could be found for the table . Please specify one with --split-by or perform a sequential import with '-m 1' ". The suggestion describes there are two alternative approaches to this scenario.

Best way is option 2

  1. To specify the number of mappers as 1 (default it takes 4). So by specifying the number of mappers to 1, the task will be sequential and identical to a single threaded task. This will succeed only when you are targeting a small table if in case if you are looking for a large import this will fail as the task tends to run forever.

  2. The best approach is to use split-by where you can specify the number of mappers on the bases of indexed columns or splitting column manually( with queries ).



标签: import sqoop