This might be a very simple thing for you gurus, but I'm not familiar with C#4 and INTEROP. Therefore, I'm stumped. Here's my problem.
I have a excel column that has duplicate data and I want to trim it down to only unique values. Here's what the data looks like:
ColA ColB
10 Adam
12 Jane
14 Adam
18 Adam
20 Eve
So, in the end I just want unique names from ColB:
Adam
Jane
Eve
I know that I can do this by getting all those values into a List and then adding the Distinct functionality to it. But I think I'm doing something wrong. Anyway, here's my program:
Application XLApp = new Microsoft.Office.Interop.Excel.Application();
var XLBook = XLApp.Workbooks.Open(@"c:\temp\Test.xls", ReadOnly: false);
// Grab the 1st sheet
var XLSheet = (Microsoft.Office.Interop.Excel.Worksheet)XLBook.Worksheets.get_Item(1);
XLApp.Visible = true;
// I think I need help with the following lines
IEnumerable<string> myCol = XLApp.Range["B2", XLApp.Range["B2"].End[XlDirection.xlDown]].Select();
myCol.ToList().Distinct();
XLBook.Close(false, System.Reflection.Missing.Value, System.Reflection.Missing.Value);
((_Application)XLApp).Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(XLApp);
How do I go about doing this using C#4 against Excel?
Thanks in advance.
You can use LinqToExcel to easily get the distinct values in a column
var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet()
select row["ColB"]).Distinct()
EDIT:
To use Distinct
in LinqToExcel, you have to use a class that corresponds to the row data.
public class WorksheetRow
{
public string ColA { get; set; }
public string ColB { get; set; }
}
var excel = new ExcelQueryFactory("worksheetFileName");
var distinctNames = (from row in excel.WorkSheet<WorksheetRow>()
select row.ColB).Distinct()
LinqToExcel built-in distinct() supports single property. I use below to distinct on more than one columns:
- Move it to memory, .AsEnumerable().
- Use struct (c#), not class. struct is a value type, class is not.
public struct RowStruct
{
public string C1 {get; set;}
public string C2 {get; set;}
public int C3 {get; set;}
}
public class RowClass // class is NOT distinct friendly
{
public string C1 {get; set;}
public string C2 {get; set;}
public int C3 {get; set;}
}
void Main()
{
var excel = new ExcelQueryFactory(@"C:\Temp\a.xlsx");
var qs = from c in excel.Worksheet<RowStruct>("Sheet1") select c;
Console.WriteLine ("struct distinct is:{0}",
qs.AsEnumerable().Distinct().Count());
var qc = from c in excel.Worksheet<RowClass>("Sheet1") select c;
Console.WriteLine ("class distinct is:{0}",
qc.AsEnumerable().Distinct().Count());
}
My a.xlsx has duplicate data, here is my result:
struct distinct is:235
class distinct is:329
In Excel, select the column, go to..
Data > Remove Duplicates
This leaves you with the unique values.