I have set of value in float (always less than 0). Which I want to bin into histogram, i,e. each bar in histogram contain range of value [0,0.150)
The data I have looks like this:
0.000
0.005
0.124
0.000
0.004
0.000
0.111
0.112
Whith my code below I expect to get result that looks like
[0, 0.005) 5
[0.005, 0.011) 0
...etc..
I tried to do do such binning with this code of mine. But it doesn't seem to work. What's the right way to do it?
#! /usr/bin/env python
import fileinput, math
log2 = math.log(2)
def getBin(x):
return int(math.log(x+1)/log2)
diffCounts = [0] * 5
for line in fileinput.input():
words = line.split()
diff = float(words[0]) * 1000;
diffCounts[ str(getBin(diff)) ] += 1
maxdiff = [i for i, c in enumerate(diffCounts) if c > 0][-1]
print maxdiff
maxBin = max(maxdiff)
for i in range(maxBin+1):
lo = 2**i - 1
hi = 2**(i+1) - 1
binStr = '[' + str(lo) + ',' + str(hi) + ')'
print binStr + '\t' + '\t'.join(map(str, (diffCounts[i])))
~
The first error is:
Why are you converting an int to a str when a str is needed? Fix that, then we get:
because you've only made 5 buckets. I don't understand your bucketing scheme, but let's make it 50 buckets and see what happens:
maxdiff
is a single value out of your list of ints, so what ismax
doing here? Remove it, now we get:Sure enough, you're using a single value as the second argument to
map
. Let's simplify the last two lines from this:to this:
Now it prints:
I'm not sure what else to do here, since I don't really understand the bucketing you are hoping to use. It seems to involve binary powers, but isn't making sense to me...
When possible, don't reinvent the wheel. NumPy has everything you need: