Now the question is pretty hard. I have a linq queries like the way below
var lstSimilars = from similarWords in lstAllWords
where similarWords.StartsWith(srWordLocal)
select similarWords;
foreach (string srVar in lstSimilars)
{
string srTempWord = srVar.Replace(srWordLocal, "");
if (dtWords.ContainsKey(srTempWord) == true)
{
csWords.updateSplitWord(srWordLocal + ";" + srTempWord, dtWords[srVar]);
}
}
lstSimilars = from similarWords in lstAllWords
where similarWords.EndsWith(srWordLocal)
select similarWords;
foreach (string srVar in lstSimilars)
{
string srTempWord = srVar.Replace(srWordLocal, "");
if (dtWords.ContainsKey(srTempWord) == true)
{
csWords.updateSplitWord(srWordLocal + ";" + srTempWord, dtWords[srVar]);
}
}
Now lstAllWords
is a string list variable generated like the way below
List<string> lstAllWords = new List<string>();
for (int i = 0; i < dsWordsSplit.Tables[0].Rows.Count; i++)
{
lstAllWords.Add(dsWordsSplit.Tables[0].Rows[i]["Word"].ToString());
}
My question is how should i keep that Words data for having best LINQ selection performance. I mean currently i am keeping it as a string list. But can i keep it different way and have better performance ?
dtWords
is a dictionary object
C# C#-4.0 LINQ
If all you want is efficiently finding words that start or end with given substring, employing the SortedSet will help you do that in O(log(N)) time.
The idea is to put words in two
SortedSet
s:Toy implementation:
This prints:
If you have to search for infixes as well, then the above is not enough - you'll need a suffix tree or array, but this is no picnic implementing correctly and efficiently.
BTW, If the data happens to be in the database, you can let the DBMS do essentially the same thing by:
ORIGINAL_WORD_COLUMN LIKE 'pefix%'
REVERSED_WORD_COLUMN LIKE 'reversed_suffix%'
.A string list should be sufficiently performant for selecting from, but you're adding some boxing/unboxing operations by selecting into and then iterating over a
var
. You can use a strongly-typedList<string>
as your recipient of LINQ query results for a performance boost, but it'll likely only be noticeable for very large datasets.