I have a list or an array of decimal numbers in Python. I need to round them to the nearest 2 decimal places as these are monetary amounts. But, I need the overall sum to be maintained, i.e. the sum of the original array rounded to 2 decimal places must be equal to the sum of the rounded elements of the array.
Here's my code so far:
myOriginalList = [27226.94982, 193.0595233, 1764.3094, 12625.8607, 26714.67907, 18970.35388, 12725.41407, 23589.93271, 27948.40386, 23767.83261, 12449.81318]
originalTotal = round(sum(myOriginalList), 2)
# Answer = 187976.61
# Using numpy
myRoundedList = numpy.array(myOriginalList).round(2)
# New Array = [ 27226.95 193.06 1764.31 12625.86 26714.68 18970.35 12725.41 23589.93 27948.4 23767.83 12449.81]
newTotal = myRoundedList.sum()
# Answer = 187976.59
I need an efficient way of amending my new rounded array such that the sum is also 187976.61. The 2 pence difference needs to be applied to items 7 and 6 as these have the greatest difference between the rounded entries and the original entries.
As noted in an answer by kettlehell, try the PyPI package
iteround
. It's not optimized to use NumPy, however.The first step is to calculate the error between the desired result and the actual sum:
This can be either positive or negative. Since every item in
myRoundedList
is within 0.005 of the actual value, this error will be less than 0.01 per item of the original array. You can simply divide by 0.01 and round to get the number of items that must be adjusted:Now all that's left is to select the items that should be adjusted. The optimal results come from adjusting those values that were closest to the boundary in the first place. You can find those by sorting by the difference between the original value and the rounded value.
If you have a long list, the above methods are inefficient because they are O(n*log(n)) (sorting of
n
elements). If the chances are high that you should change only at a few (or one) of these indexes you could use a heap (or a min/max if there's only one place to change).I'm not much of a python coder, but here's a solution considering the above (but not considering the floating point representation inaccuracy (already mentioned by others)).
d
is the numerical difference between the rounded sum and the sum of rounds, this tells us how many places should we change the rounding. Ifd
is zero we clearly have nothing to do. Ifd
is1
or-1
the best place can be found easily withmin
ormax
. For an arbitrary number, we can useheapq.nlargest
to find the bestD=abs(d)
places.So why is there a
max
, ifnlargest
would do?! Becausemin
andmax
are much more efficiently implemented than that.This way, the algorithm is O(n+D*log(n)).
Note: With a heap, you can create an O(n+D^2*log(D)) algorithm as well because the top
D
elements should be on the top D levels of the heap, and you can order that list in O(D^2*log(D)) steps. Ifn
is huge andD
is very small this can mean a lot.(Rights for reconsideration reserved (because it's after midnight).)
First of all you shouldn't use floats for storing money (use Decimals instead). But below I provide some quite generic solution - you need to store, accumulate and use the sum of differences in rounding. Some verbose (and not very pythonic ;-) example with your numbers:
With all the caveats about using floating point numbers: