I am trying to copy the retail_db database tables into hive database which I already created. When I execute the following code
sqoop import-all-tables \
--num-mappers 1 \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--hive-import \
--hive-overwrite \
--create-hive-table \
--outdir java_files \
--hive-database retail_stage
My Map-reduce job stops with the following error:
ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/user/cloudera/categories already exists
I am trying to copy the tables to hive database,Then why an existing file in cloudera caused the problem. Is there a way to ignore this error or overwrite the existing file.
This is how
sqoop
imports job works:sqoop
creates/imports data intmp
dir(HDFS
) which is user's home dir(in your case it is/user/cloudera
).Then copy data to its actual hive location (i.e.,
/user/hive/wearhouse
.This
categories
dir should have exist before you ran import statements. so delete that dir or rename it if its important.hadoop fs -rmr /user/cloudera/categories
OR
hadoop fs -mv /user/cloudera/categories /user/cloudera/categories_1
and re-run sqoop command!
So in short, Importing to
Hive
will use hdfs as the staging place and sqoop deletes staging dir/user/cloudera/categories
after copying(sucessfully) to actual hdfs location - it is last stage of sqoop job to clean up staging/tmp files - so if you try to list the tmp staging dir, you won't find it.After successful import:
hadoop fs -ls /user/cloudera/categories
- dir will not be there.You cannot use
hive-import
andhive-overwrite
at the same time.The version I confirmed this issue is;
ref. https://stackoverflow.com/a/22407835/927387
Based on answer #1 above, I found this. I tried and it works.
So, just add --delete-target-dir
Sqoop import to Hive works in 3 steps:
You have not mentioned
--target-dir
or--warehouse-dir
, so it will put data in HDFS Home Directory which I believe/user/cloudera/
in your case.Now for a MySQL table
categories
you might have imported it earlier. So,/user/cloudera/categories
directory exists and you are getting this exception.Add any non-existing directory in
--taget-dir
like--taget-dir /user/cloudera/mysqldata
. Then sqoop will put all the Mysql Tables imported by above command in this location.