I have a CSV file and a database table as inputs. I need to keep all the records that are in the database table but exclude those that are in the CSV file. It all gets written to a CSV file in the end.
All my attempts have resulted in hundreds of duplicates so I'm rather challenged here.
This is a quick-and-dirty trick, but it should fit your needs.
Connect your database and your CVS input to a TMap- Use your DB as main connection and CSV as Lookup. Something like:
tMySqlInput---->Main--->tMap---out---->tFileDelimitedOutput
^
|
|
Lookup
|
|
tFileDelimitedInput
Inside the tMap, map the join columns as usual. Click on the wrench just above the lookup mapping area and set the join type as Inner join. Be sure that "Match Model" is not set to All, to avoid cross joins. Then, create the "out" output connection. Finally, click the wrench of this connection and set "Catch lookup inner join reject" to true
On "out" (so, the output file) you will have only the unmatched rows from the DB. Depending on your data, you may need to add a tAggregateRow just before tFileDelimitedOutput to eventually clean duplicates.