I have an excel file which gets updated on a daily basis i.e the data is always different every time.
I am pulling the data from the Excel sheet into the table using Talend. I have a primary key Company_ID
defined in the table.
The error I am facing is that the Excel sheet has few duplicate Company_ID
values. It will also pick up more duplicate values in the future as the Excel file will be updated daily.
I want to choose the first record where the Company ID
field is 1
and the record doesn't have null in the rest of the columns. Also, for a Company_ID
of 3
there is a null value for one column which is ok since it is a unique record for that company_id
.
How do I choose a unique row which has maximum no. of column values present ie for eg in the case of Company ID
of 1
in Talend ?
tUniqRow is usually the easiest way to handle duplicates. If you are worried that the first row coming to tUniqRow may not be the first row that you want there, you can sort your rows, so they enter tUniqRow in your preferred order:
(used components: tFileInputExcel, tJavaRow, tSortRow, tUniqRow, tFilterColumns)
In your particular case, the tJava could look like this: