Is there a fast/simple way to calculate the frequency distribution of a .Net collection using Linq or otherwise?
For example: An arbitrarily long List contains many repetitions. What's a clever way of walking the list and counting/tracking repetitions?
The easiest way is to use a hashmap and either use the value as the key and increment the value, or pick a bucket size (bucket 1 = 1 - 10, bucket 2 = 11 - 20, etc), and increment each bucket by the value.
Then you can go through and determine the frequencies.
The C5 generic collections library has a
HashBag
implementation that accepts duplicates by counting. The following pseudo-code would get you what you're looking for:(where
K
is the type of the items in your list)mults
will then contain anIDictionary<K,int>
where the list item is the key and the multiplicity is the value.The simplest way to find duplicate items in a list is to group it, like this:
(Writing
Skip(1).Any()
should be faster than (Count() > 1) because it won't have to traverse more than two items from each group. However, the difference is probably negligible unlesslist
's enumerator is slow)