How to count number of occurrence of each word in

2020-03-26 03:38发布

问题:

I use the following code to extract words from string input, how can I get the occurrence of each words too?

var words = Regex.Split(input, @"\W+")
                        .AsEnumerable()
                        .GroupBy(w => w)
                        .Where(g => g.Count() > 10)
                        .Select(g => g.Key);

回答1:

Instead of Regex.Split you can use string.Split and get the count for each word like:

string str = "Some string with Some string repeated";
var result  = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
                .GroupBy(r => r)
                .Select(grp => new
                    {
                        Word = grp.Key,
                        Count = grp.Count()
                    });

If you want to filter out those words which are repeated 10 times atleast then you can add a condition before Select like Where(grp=> grp.Count >= 10)

For output:

foreach (var item in result)
{
    Console.WriteLine("Word: {0}, Count:{1}", item.Word, item.Count);
}

Output:

Word: Some, Count:2
Word: string, Count:2
Word: with, Count:1
Word: repeated, Count:1

For case insensitive grouping you can replace the current GroupBy with:

.GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)

So your query would be:

var result = str.Split(new[] { " " }, StringSplitOptions.RemoveEmptyEntries)
                .GroupBy(r => r, StringComparer.InvariantCultureIgnoreCase)
                .Where(grp => grp.Count() >= 10)
                .Select(grp => new
                    {
                        Word = grp.Key,
                        Count = grp.Count()
                    });


回答2:

Try this:

var words = Regex.Split(input, @"\W+")
                        .AsEnumerable()
                        .GroupBy(w => w)
                        .Select(g => new {key = g.Key, count = g.Count()});


回答3:

Remove the Select statement to keep the IGrouping which you can use to view both the keys and take a count of values.

var words = Regex.Split(input, @"\W+")
                    .AsEnumerable()
                    .GroupBy(w => w)
                    .Where(g => g.Count() > 10);

foreach (var wordGrouping in words)
{
    var word = wordGrouping.Key;
    var count = wordGrouping.Count();
}


回答4:

You could produce a dictionary like this:

var words = Regex.Split(input, @"\W+")
                 .GroupBy(w => w)
                 .Select(g => g.Count() > 10)
                 .ToDictionary(g => g.Key, g => g.Count());

Or if you'd like to avoid having to compute the count twice, like this:

var words = Regex.Split(input, @"\W+")
                 .GroupBy(w => w)
                 .Select(g => new { g.Key, Count = g.Count() })
                 .Where(g => g.Count > 10)
                 .ToDictionary(g => g.Key, g => g.Count);

And now you can get the count of words like this (assuming the word "foo" appears more than 10 times in input):

var fooCount = words["foo"];


标签: c# asp.net