I have a column (not the last column) in Excel file that contains data which is spanning over few lines.
Some cells of column is blank and some have single lines entries.
When saving as .CSV file or a tab separated .txt from excel, all the multi-line data and few single line entries are getting generated in double quotes, None of the blank fields are in quotes. Some of the single line entries are not within quotes.
Is it possible to store the data with this same structure in a hive table? If Yes, how can this be done? I understand I need to escape all the LF within double-quotes and take care of the last LF only as the actual EOL. But the moment a '\n' is encountered, Hive takes data to a new row.
The format of data in excel is like as below:
|------+------+--------+------------------+-------+------|
|row1: | col1 | col2 | col3(multi-line) | col4 | col5 |
|------+------+--------+------------------+-------+------|
| | | | line 1 of 3 | | |
|row2: | abc | defsa | line 2 of 3 | bcde | hft |
| | | | line 3 of 3 | | |
|------+------+--------+------------------+-------+------|
|row3: | abc2 | defsa2 | (blank) | bcde2 | hft2 |
|------+------+--------+------------------+-------+------|
|row4: | abc3 | defsa3 | single-line1 | bcde3 | hft3 |
|------+------+--------+------------------+-------+------|
|row5: | abc4 | defsa4 | single-line2 | bcde4 | hft4 |
|------+------+--------+------------------+-------+------|
When saved as CVS it outputs to the following:
row1--col1,col2,col3(multi-line),col4,col5
row2--abc,defsa,line 1 of 3",,,,,,
row3--line 2 of 3,,,,,,
row4--line 3 of 3,,,,,,
row5--",bcde,hft
row6--abc2,defsa2,,bcde2,hft2
row7--abc3,defsa3,single-line1,bcde3,hft3
row8--abc4,defsa4,single-line2",,,,,,
row9--",bcde4,hft4
5 rows of excel to 9 rows of csv.
Appreciate inputs to store from this .csv file into a hive table, if possible without changing the structure and maintaining the multi-line column.