Python - Count occurrences of certain ranges in a

2019-02-10 16:42发布

So basically I want to count the number of occurrences a floating point appears in a given list. For example: a list of grades (all scores out of 100) are inputted by the user and they are sorted in groups of ten. How many times do scores from 0-10, 10-20, 20-30.. etc) appear? Like test score distribution. I know I can use the count function but since I'm not looking for specific numbers I'm having trouble. Is there a away to combine the count and range? Thanks for any help.

4条回答
我想做一个坏孩纸
2楼-- · 2019-02-10 17:27

This method uses bisect which can be more efficient, but it requires that you sort the scores first.

from bisect import bisect
import random

scores = [random.randint(0,100) for _ in xrange(100)]
bins = [20, 40, 60, 80, 100]

scores.sort()
counts = []
last = 0
for range_max in bins:
    i = bisect(scores, range_max, last)
    counts.append(i - last)
    last = i

I wouldn't expect you to install numpy just for this, but if you already have numpy you can use numpy.histogram.

UPDATE

First, using bisect is more flexible. Using [i//n for i in scores] requires that all the bins are the same size. Using bisect allows the bins to have arbitrary limits. Also i//n means the ranges are [lo, hi). Using bisect the ranges are (lo, hi] but you can use bisect_left if you want [lo, hi).

Second bisect is faster, see timings bellow. I've replaced scores.sort() with the slower sorted(scores) because the sorting is the slowest step and I didn't want to bias the times with a pre-sorted array, but the OP says his/her array is already sorted so bisect could make even more sense in that case.

setup="""
from bisect import bisect_left
import random
from collections import Counter

def histogram(iterable, low, high, bins):
    step = (high - low) / bins
    dist = Counter(((x - low + 0.) // step for x in iterable))
    return [dist[b] for b in xrange(bins)]

def histogram_bisect(scores, groups):
    scores = sorted(scores)
    counts = []
    last = 0
    for range_max in groups:
        i = bisect_left(scores, range_max, last)
        counts.append(i - last)
        last = i
    return counts

def histogram_simple(scores, bin_size):
    scores = [i//bin_size for i in scores]
    return [scores.count(i) for i in range(max(scores)+1)]

scores = [random.randint(0,100) for _ in xrange(100)]
bins = range(10, 101, 10)
"""
from timeit import repeat
t = repeat('C = histogram(scores, 0, 100, 10)', setup=setup, number=10000)
print min(t)
#.95
t = repeat('C = histogram_bisect(scores, bins)', setup=setup, number=10000)
print min(t)
#.22
t = repeat('histogram_simple(scores, 10)', setup=setup, number=10000)
print min(t)
#.36
查看更多
Juvenile、少年°
3楼-- · 2019-02-10 17:30
decs = [int(x/10) for x in scores]

maps scores from 0-9 -> 0, 10-19 -> 1, et cetera. Then just count the occurrences of 0, 1, 2, 3, and so on (via something like collections.Counter), and map back to ranges from there.

查看更多
兄弟一词,经得起流年.
4楼-- · 2019-02-10 17:34

If you are fine with using the external library NumPy, then you just need to call numpy.histogram():

>>> data = [82, 85, 90, 91, 70, 87, 45]
>>> counts, bins = numpy.histogram(data, bins=10, range=(0, 100))
>>> counts
array([0, 0, 0, 0, 1, 0, 0, 1, 3, 2])
>>> bins
array([   0.,   10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,
         90.,  100.])
查看更多
男人必须洒脱
5楼-- · 2019-02-10 17:39

To group the data, divide it by the interval width. To count the number in each group, consider using collections.Counter. Here's a worked out example with documentation and a test:

from collections import Counter

def histogram(iterable, low, high, bins):
    '''Count elements from the iterable into evenly spaced bins

        >>> scores = [82, 85, 90, 91, 70, 87, 45]
        >>> histogram(scores, 0, 100, 10)
        [0, 0, 0, 0, 1, 0, 0, 1, 3, 2]

    '''
    step = (high - low + 0.0) / bins
    dist = Counter((float(x) - low) // step for x in iterable)
    return [dist[b] for b in range(bins)]

if __name__ == '__main__':
    import doctest
    print doctest.testmod()
查看更多
登录 后发表回答