Using sqoop I can create managed table but not the externale table.
Please let me know what are the best practices to unload data from data warehouse and load them in Hive external table.
1.The tables in warehouse are partitioned. Some are date wise partitioned some are state wise partitioned.
Please put your thoughts or practices used in production environment.
Sqoop does not support creating Hive external tables. Instead you might:
- Use the Sqoop codegen command to generate the SQL for creating the Hive internal table that matches your remote RDBMS table (see http://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_literal_sqoop_codegen_literal)
- Modify the generated SQL to create a Hive external table
- Execute the modified SQL in Hive
- Run your Sqoop import command, loading into the pre-created Hive external table
Step 1: import data from mysql to hive table.
sqoop import
--connect jdbc:mysql://localhost/
--username training --password training
--table --hive-import --hive-table -m 1
--fields-terminated-by ','
Step 2: In hive change the table type from Managed to External.
Alter table <Table-name> SET TBLPROPERTIES('EXTERNAL'='TRUE')
Note:you can import directly into hive table or else to back end of hive.
My best suggestion is to SQOOP your data to HDFS and create EXTERNAL for Raw operations and Transformations.
Finally mashed up data to the internal table. I believe this is one of the best practices to get things done in a proper way.
Hope this helps!!!
Refer to these links:
https://mapr.com/blog/what-kind-hive-table-best-your-data/
In the above if you want to skip directly to the point -->2.2.1 External or Internal
https://hadoopsters.net/2016/07/15/hive-tables-internal-and-external-explained/
After referring to the 1st link then second will clarify most of your questions.
Cheers!!