I have checked the whole site and googled on the net but was unable to find a simple solution to this problem.
I have a datatable which has about 20 columns and 10K rows. I need to remove the duplicate rows in this datatable based on 4 key columns. Doesn't .Net have a function which does this? The function closest to what I am looking for was datatable.DefaultView.ToTable(true, array of columns to display), But this function does a distinct on all the columns.
It would be great if someone could help me with this.
EDIT: I am sorry for not being clear on this. This datatable is being created by reading a CSV file and not from a DB. So using an SQL query is not an option.
This is a very simple code which doesnot require linq nor individual columns to do the filter. If all the values of columns in a row are null it will be deleted.
This can even be used to remove null data from excel sheet.
Liggett78's answer is much better - esp. as mine had an error! Correction as follows...
"This datatable is being created by reading a CSV file and not from a DB."
So put a unique constraint on the four columns in the database, and inserts that are duplicates under your design won't go in. Unless it decides to fail instead of continuing when this happens, but this surely is configurable in your CSV import script.
You can use Linq to Datasets. Check this. Something like this:
If you have access to Linq I think you should be able to use the built in group functionality on the in memory collection and pick out the duplicate rows
Search Google for Linq Group by for examples
It should be taken into account that Table.AcceptChanges() must be called to complete the deletion. Otherwise deleted row is still present in DataTable with RowState set to Deleted. And Table.Rows.Count is not changed after deletion.