Is there a fast/simple way to calculate the frequency distribution of a .Net collection using Linq or otherwise?
For example: An arbitrarily long List contains many repetitions. What's a clever way of walking the list and counting/tracking repetitions?
The easiest way is to use a hashmap and either use the value as the key and increment the value, or pick a bucket size (bucket 1 = 1 - 10, bucket 2 = 11 - 20, etc), and increment each bucket by the value.
Then you can go through and determine the frequencies.
The simplest way to find duplicate items in a list is to group it, like this:
var dups = list.GroupBy(i => i).Where(g => g.Skip(1).Any());
(Writing Skip(1).Any()
should be faster than (Count() > 1) because it won't have to traverse more than two items from each group. However, the difference is probably negligible unless list
's enumerator is slow)
The C5 generic collections library has a HashBag
implementation that accepts duplicates by counting. The following pseudo-code would get you what you're looking for:
var hash = new HashBag();
hash.AddAll(list);
var mults = hash.ItemMultiplicities();
(where K
is the type of the items in your list) mults
will then contain an IDictionary<K,int>
where the list item is the key and the multiplicity is the value.