How to remove duplicate records from excel sheet u

2019-08-21 10:42发布

问题:

I have an excel sheet with 4 column(JobCode,JobName,StartDate,EndDate). On the basis of one rule I have to validate the 1st excel sheet and insert all the record in 2nd excel sheet except the duplicate record which is present in 1st excel sheet. I tried to using list. But it's working as expected.

List<string> JobCodeList = new List<string>();
for (int iRowCount = 0; iRowCount < hrms_jobdata.Tables[0].Rows.Count; iRowCount++)
{
    JobCode = hrms_jobdata.Tables[0].Rows[iRowCount]["Job Code"].ToString();
    JobName = hrms_jobdata.Tables[0].Rows[iRowCount]["Job Name"].ToString();
    StartDate = hrms_jobdata.Tables[0].Rows[iRowCount]["Start Date"].ToString();
    EndDate = hrms_jobdata.Tables[0].Rows[iRowCount]["End Date"].ToString();
    JobCodeList.Add(JobCode + JobName);
}

connectionhrms_job.Close();


for (int iRowCount = 0; iRowCount < hrms_jobdata.Tables[0].Rows.Count; iRowCount++)
{
    JobCode = hrms_jobdata.Tables[0].Rows[iRowCount]["Job Code"].ToString();
    JobName = hrms_jobdata.Tables[0].Rows[iRowCount]["Job Name"].ToString();
    StartDate = hrms_jobdata.Tables[0].Rows[iRowCount]["Start Date"].ToString();
    EndDate = hrms_jobdata.Tables[0].Rows[iRowCount]["End Date"].ToString();

    DateTime convertedstart = DateTime.Parse(StartDate);
    StartDateFormated = convertedstart.ToString("dd-MM-yyyy");

    DateTime convertedend = DateTime.Parse(EndDate);
    EndDateFormated = convertedend.ToString("dd-MM-yyyy");

    List<string> dupvalue = removeDuplicates(JobCodeList);

    foreach (string value in dupvalue)
    {
        string jobcodename = value; 
    }

    string connectionStringdest = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + pathdestination + ";Extended Properties=Excel 12.0;";
    DbProviderFactory factorydest = DbProviderFactories.GetFactory("System.Data.OleDb");
    DbConnection connectiondest = factorydest.CreateConnection();
    connectiondest.ConnectionString = connectionStringdest;
    DbCommand command = connectiondest.CreateCommand();
    StringBuilder inserthrms_job = new StringBuilder();
    inserthrms_job = inserthrms_job.Append("Insert into [hrms_job$] values ('" + JobCode + "', '" + JobName + "', '" + StartDateFormated + "', '" + EndDateFormated + "','" + JobCode + " " + JobName + "') ");
    inserthrms_job = inserthrms_job.Append(";");
    command.CommandText = inserthrms_job.ToString();
    connectiondest.Open();
    command.ExecuteNonQuery();
    connectiondest.Close();
}

回答1:

When you query the source spreadsheet, just do a "Select top 1 field1, field2, field3 from [Sheet$] Group by field1, field2, field3". That way you only read the first record, not the duplicates.



回答2:

Good question - I don't think it's possible via the SELECT statement in the oledbadapter per the one answer I see (if so, please advise).

See this OleDBAdapter Excel QA I posted via stack overflow.

Put your into DataSet into an object like I did at the bottom of the post.

Then extend your object via the enumerable.distinct (see MSDN example), so you your object returns distinct elements from a sequence by using the default equality comparer to compare values.

then at the bottom of that post:

var noduplicates = query.Distinct();
foreach (var rec in noduplicates)
    Console.WriteLine(rec.ManagedLocationID + " " + rec.PartID + " " + rec.Quantity);


标签: c# excel oledb