Using sqoop import, How to append rows into existi

From SQL server I imported and created a hive table using the below query.

sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --create-hive-table --fields-terminated-by ','

Command was successful, imported the data and created a table with 10000 records.

I inserted 10 new records in SQL server and tried to append these 10 records into existing hive table using --where clause

sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable

But the sqoop job is getting failed with error

ERROR tool.ImportTool: Error during import: Import job failed!

Where am I going wrong? any other alternatives to insert into table using sqoop.

EDIT: After slightly changing the above command I am able to append the new rows.

sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --where "ID > 10000" --hive-import -hive-table hivedb.demotable --fields-terminated-by ',' -m 1

Though it resolves the mentioned problem, I can't insert the modified rows. Is there any way to insert the modified rows without using --incremental lastmodified parameter.

标签： hadoop hive sqoop

4条回答

劫难

2楼-- · 2019-08-07 19:05

Use --append option and -m 1 so it will be like below :

sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --table demotable --hive-import --hive-table hivedb.demotable --append -m 1

0人赞添加讨论(0) 举报

劫难

3楼-- · 2019-08-07 19:07

in order to append rows to hive table, use the same query you have been using before, just remove the --hive-overwrite.

I will share the 2 queries that I used to import in hive, one for overwriting and one for append, you can use the same for importing:

To OVERWRITE the previous records

sqoop import -Dmapreduce.job.queuename=default --connect     jdbc:teradata://database_connection_string/DATABASE=database_name,TMODE=ANSI,LOGMECH=LDAP --username z****** --password ******* --query "select * from ****** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-table hive_table_name --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name  --delete-target-dir --hive-import --hive-overwrite --driver com.teradata.jdbc.TeraDriver

TO APPEND to the previous records

 sqoop import -Dmapreduce.job.queuename=default --connect jdbc:teradata://connection_string/DATABASE=db_name,TMODE=ANSI,LOGMECH=LDAP --username ****** --password ******--query "select * from **** where \$CONDITIONS" --split-by "HASHBUCKET(HASHROW(key to split)) MOD 4" --num-mappers 4 --hive-import --hive-table guestblock.prodrptgstrgtn --boundary-query "select 0, 3 from dbc.dbcinfo" --target-dir directory_name  --delete-target-dir --driver com.teradata.jdbc.TeraDriver

Note that I am using 4 mappers, you can use more as well.

0人赞添加讨论(0) 举报

戒情不戒烟

4楼-- · 2019-08-07 19:14

We can use this command:

 sqoop import --connect 'jdbc:sqlserver://10.1.1.12;database=testdb' --username uname --password paswd --query 'select * from demotable where ID > 10000' --hive-import --hive-table hivedb.demotable --target-dir demotable_data

0人赞添加讨论(0) 举报

We Are One

5楼-- · 2019-08-07 19:15

I am not sure if you can give direct --append option in sqoop with --hive-import option. Its still not available atleast in version 1.4.

The default behavior is append when --hive-overwrite and --create-hive-table is missing. (atleast in this context.

I go with nakulchawla09's answer. Though remind yourself to keep the --split-by option . This will ensure the split names in hive data store is appropriately created. otherwise you will not like the default naming. You can ignore this comment in case you don't care for the backstage hive warehouse naming and backstage data store. When i tried with the below command

Before the append

beeline:hive2> select count(*) from geolocation;

+-------+--+
|  _c0  |
+-------+--+
| 8000  |
+-------+--+

file in hive warehouse before the append

-rwxrwxrwx   1 root hdfs     479218 2018-10-12 11:03 /apps/hive/warehouse/geolocation/part-m-00000

sqoop command for appending additional 8k records again

sqoop import --connect jdbc:mysql://localhost/RAWDATA --table geolocation --username root --password hadoop --target-dir /rawdata --hive-import  --driver com.mysql.jdbc.Driver --m 1 --delete-target-dir

it created the below files. You can see the file name is not great because did not give a split by option or split hash (can be datetime or date).

-rwxrwxrwx   1 root hdfs     479218 2018-10-12 11:03 /apps/hive/warehouse/geolocation/part-m-00000
-rwxrwxrwx   1 root hdfs     479218 2018-10-12 11:10 /apps/hive/warehouse/geolocation/part-m-00000_copy_1

hive records appended now

beeline:hive2> select count(*) from geolocation;

+-------+--+
|  _c0  |
+-------+--+
| 16000  |
+-------+--+

0人赞添加讨论(0) 举报

Using sqoop import, How to append rows into existi

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间