Cross Index Referencing

2019-04-17 00:38发布

问题:

Sorry if the title is confusing, I'm not exacting sure how I should label this, I tried.

I'm writing a program to do some cross index searching, the program is written in C# using Visual Studio 2010.

I have a table with 3 columns: Category, Type, and Item. The table is read in from a excel spreadsheet and stored in some kind of data structure (will explain this later). Below is a short example of the table.

| CATEGORY  | TYPE  | ITEM  | <<header row
| categoryA | typeA | itemA | <<first entry
| categoryA | typeB | N/A   |
| categoryA | typeC | itemB |
| categoryA | typeD | N/A   |

I will be reading two user input strings, and I want the program determine whether they are a match. [Assuming user input has no typo, I have written a function to handle this and normalize both strings]

The logic of determine whether two strings are a match is like this:

1) If a string is a CATEGORY, every TYPE and ITEM that has the same CATEGORY is a match.

2) If a string is a TYPE or ITEM, only other data in the same row is a match

Here are some examples, string a and b are two input strings and match is a boolean value:

1) a = "categoryA", b = "typeA", match = true
2) a = "categoryA", b = "itemB", match = true
3) a = "typeC", b = "itemB", match = true
4) a = "typeC", b = "itemA", match = false
5) a = "itemA", b = "itemB", match = false

I'll give more examples if this is not clear enough.

So my overall question is: what is the most suitable data structure to store the data from the excel spreadsheet, and how would I do search/compare match with this data structure?

I though of using a Dictionary<string, string>, so I can search string a in the dictionary and get a list of match strings and compare, but this way I will have a huge dictionary and multiple same key, which will not work.

Any suggestion/help is appreciated.

回答1:

I would think about using DataTable from System.Data namespace which is suitable for storing in memory tabular data. What may make it more attractive to you is a possibility to query it with SQL like query via DataView class RowFilter property.

Some pseudo code:

DataTable excelTable = new DataTable(); 

//a method that reads Excel doc and injects data into DataTable
PopulateFromExcel(excelTable); 

DataView dv = new DataView(excelTable); 
dv.RowFilter = "a = 'categoryA' AND b= 'typeA'"; 
var match = dv.Count > 0;


回答2:

I have two suggestions: One optimizes efficiency, whereas the other optimizes memory usage:


If you do lots of lookups, the most efficient data structure is probably a HashSet of Tuples. Here is an example:

var set = new HashSet<Tuple<string, string>>();

set.Add(Tuple.Create("categoryA", "typeA"));
set.Add(Tuple.Create("categoryA", "itemA"));
set.Add(Tuple.Create("typeA", "itemA"));
set.Add(Tuple.Create("categoryA", "typeB"));
set.Add(Tuple.Create("categoryA", "typeC"));
set.Add(Tuple.Create("categoryA", "itemB"));
...

var found1 = set.Contains(Tuple.Create("categoryA", "typeC")); // yields True
var found2 = set.Contains(Tuple.Create("itemA", "itemB"));  // yields False

When you read your data, you add, for each line, all possible combinations that should yield True to the HashSet. It will be quite large, but the lookup operation should be almost instantaneous.


Alternatively, you could create a class MyRow with fields Category, Type and Item and store the data in a List<MyRow>. Then, you can use LINQ to find matching records:

var isMatch = myList.Any(row => (row.Category == string1 && row.Type == string2)
                             || (row.Category == string1 && row.Item == string2)
                             || ...);

This requires the least memory (since every value is stored only once). However, each search operation traverses the complete list until a match is found.



回答3:

I would suggest using a DOCMD.TransferSpreadsheet method and export the excel data to access and do some simplet quering to match your requirements.