I use the following code to extract words from string input, how can I get the occurrence of each words too?
var words = Regex.Split(input, @"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10)
.Select(g => g.Key);
Instead of Regex.Split
you can use string.Split
and get the count for each word like:
string str = "Some string with Some string repeated";
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
If you want to filter out those words which are repeated 10 times atleast then you can add a condition before Select
like Where(grp=> grp.Count >= 10)
For output:
foreach (var item in result)
{
Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}
Output:
Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1
For case insensitive grouping you can replace the current GroupBy with:
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
So your query would be:
var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
.Where(grp => grp.Count() >= 10)
.Select(grp => new
{
Word = grp.Key,
Count = grp.Count()
});
Try this:
var words = Regex.Split(input, @"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Select(g => new {key = g.Key, count = g.Count()});
Remove the Select
statement to keep the IGrouping
which you can use to view both the keys and take a count of values.
var words = Regex.Split(input, @"\W+")
.AsEnumerable()
.GroupBy(w => w)
.Where(g => g.Count() > 10);
foreach (var wordGrouping in words)
{
var word = wordGrouping.Key;
var count = wordGrouping.Count();
}
You could produce a dictionary like this:
var words = Regex.Split(input, @"\W+")
.GroupBy(w => w)
.Select(g => g.Count() > 10)
.ToDictionary(g => g.Key, g => g.Count());
Or if you'd like to avoid having to compute the count twice, like this:
var words = Regex.Split(input, @"\W+")
.GroupBy(w => w)
.Select(g => new { g.Key, Count = g.Count() })
.Where(g => g.Count > 10)
.ToDictionary(g => g.Key, g => g.Count);
And now you can get the count of words like this (assuming the word "foo" appears more than 10 times in input
):
var fooCount = words["foo"];