I have table test in mysql as below:
id name address
1 Km sky
2 hd heaven
3 Ab null
4 en null
Now I done a sqoop import as below
sqoop import--connect jdbc:mysql://XXXXXX/testing --username XXXX --password XXXX --query "select * from testing.test where \$CONDITIONS" --null-string '' --null-non-string '' -m 1\
--hive-import --hive-database testing --hive-table test --create-hive-table --target-dir /user/hive/warehouse/testing.db/test
I got the desired result.
Then we added a new colum to the mysql table with extra 2 rows
id name address nation
1 Km sky null
2 hd heaven null
3 Ab null null
4 en null null
5 abc efd USA
6 fge cde UK
Now I want the existing hive table with the above columns and rows updated. I have done the following sqoop job
Sqoop job:
sqoop job --create sqoop_test -- import --connect jdbc:mysql:xxxxxxx/testing --username XXXXX --password XXXX --query "SELECT * from testing.test WHERE \$CONDITIONS" --incremental append\
--check-column id --last-value "3" --split-by 'id' --target-dir /user/hive/warehouse/testing.db/test
But when I query the hive table I get the result as null for the new rows and the new columns doesn't show up. Like below
id name address
NULL NULL NULL
NULL NULL NULL
1 Km sky
2 hd heaven
3 Ab
4 en
How can we have the new columns appended and new rows added to the existing table in hive?
Or Is the method I am using is completely wrong. Please let me know
your assumption is wrong, the reason for this is because you are importing data with a different layout. The first table that you created, has 3 columns and in the second import, you are importing 4 columns, for that reason, Hive can't parse those new records and simply print null for all the columns. If you don't have a good reason to import the data in textfile format, I suggest you create the table in avro and user the schema evolution feature to add new columns.
When you import data in avro, Sqoop autogenerate the schems for you. so the only thing that you need is create a table pointing to the imported data and use the generated schema. In the case of future imports with new fields, you will need add those fields with a valid default value or make them nullables with default value as follow (for example for string column)
or even specify other valid default values