Round a Python list of numbers and maintain their

2019-04-29 11:03发布

问题:

I have a list or an array of decimal numbers in Python. I need to round them to the nearest 2 decimal places as these are monetary amounts. But, I need the overall sum to be maintained, i.e. the sum of the original array rounded to 2 decimal places must be equal to the sum of the rounded elements of the array.

Here's my code so far:

myOriginalList = [27226.94982, 193.0595233, 1764.3094, 12625.8607, 26714.67907, 18970.35388, 12725.41407, 23589.93271, 27948.40386, 23767.83261, 12449.81318]
originalTotal = round(sum(myOriginalList), 2)
# Answer = 187976.61

# Using numpy
myRoundedList = numpy.array(myOriginalList).round(2)
# New Array = [ 27226.95    193.06   1764.31  12625.86  26714.68  18970.35  12725.41 23589.93 27948.4   23767.83  12449.81]

newTotal = myRoundedList.sum()
# Answer = 187976.59

I need an efficient way of amending my new rounded array such that the sum is also 187976.61. The 2 pence difference needs to be applied to items 7 and 6 as these have the greatest difference between the rounded entries and the original entries.

回答1:

The first step is to calculate the error between the desired result and the actual sum:

>>> error = originalTotal - sum(myRoundedList)
>>> error
0.01999999996041879

This can be either positive or negative. Since every item in myRoundedList is within 0.005 of the actual value, this error will be less than 0.01 per item of the original array. You can simply divide by 0.01 and round to get the number of items that must be adjusted:

>>> n = int(round(error / 0.01))
>>> n
2

Now all that's left is to select the items that should be adjusted. The optimal results come from adjusting those values that were closest to the boundary in the first place. You can find those by sorting by the difference between the original value and the rounded value.

>>> myNewList = myRoundedList[:]
>>> for _,i in sorted(((myOriginalList[i] - myRoundedList[i], i) for i in range(len(myOriginalList))), reverse=n>0)[:abs(n)]:
    myNewList[i] += math.copysign(0.01, n)

>>> myRoundedList
[27226.95, 193.06, 1764.31, 12625.86, 26714.68, 18970.35, 12725.41, 23589.93, 27948.4, 23767.83, 12449.81]
>>> myNewList
[27226.95, 193.06, 1764.31, 12625.86, 26714.68, 18970.359999999997, 12725.42, 23589.93, 27948.4, 23767.83, 12449.81]
>>> sum(myNewList)
187976.61


回答2:

First of all you shouldn't use floats for storing money (use Decimals instead). But below I provide some quite generic solution - you need to store, accumulate and use the sum of differences in rounding. Some verbose (and not very pythonic ;-) example with your numbers:

# define your accuracy
decimal_positions = 2

numbers = [27226.94982, 193.0595233, 1764.3094, 12625.8607, 26714.67907, 18970.35388, 12725.41407, 23589.93271, 27948.40386, 23767.83261, 12449.81318]
print round(sum(numbers),decimal_positions)
>>> 187976.61

new_numbers = list()
rest = 0.0
for n in numbers:
    new_n = round(n + rest,decimal_positions)
    rest += n - new_n
    new_numbers.append( new_n )

print sum(new_numbers)
>>> 187976.61


回答3:

With all the caveats about using floating point numbers:

delta_pence = int(np.rint((originalTotal - np.sum(myRoundedList))*100))
if delta_pence > 0:
    idx = np.argsort(myOriginalList - myRoundedList)[-delta_pence:]
    myRoundedList[idx] += 0.01
elif delta_pence < 0:
    idx = np.argsort(myOriginalList - myRoundedList)[:delta_pence]
    myRoundedList[idx] -= 0.01

>>> myRoundedList.sum()
187976.60999999999


回答4:

As noted in an answer by kettlehell, try the PyPI package iteround. It's not optimized to use NumPy, however.

>>> from iteround import saferound
>>> saferound([1.0, 2.1, 3.6], places=0)
[1.0, 2.0, 4.0]


回答5:

If you have a long list, the above methods are inefficient because they are O(n*log(n)) (sorting of n elements). If the chances are high that you should change only at a few (or one) of these indexes you could use a heap (or a min/max if there's only one place to change).

I'm not much of a python coder, but here's a solution considering the above (but not considering the floating point representation inaccuracy (already mentioned by others)).

import math
import heapq

def roundtosum(l, r):
    q = 10**(-r)
    d = int((round(sum(l),r) - sum([ round(x, r) for x in l ])) * (10**r))
    if d == 0:
        return l
    elif d in [ -1, 1 ]:
        c, _ = max(enumerate(l), key=lambda x: math.copysign(1,d) * math.fmod(x[1] - 0.5*q, q))
        return [ round(x, r) + q * math.copysign(1,d) if i == c else round(x, r) for (i, x) in enumerate(l) ]
    else:
        c = [ i for i, _ in heapq.nlargest(abs(d), enumerate(l), key=lambda x: math.copysign(1,d) * math.fmod(x[1] - 0.5*q, q)) ]
        return [ round(x, r) + q * math.copysign(1,d) if i in c else round(x, r) for (i, x) in enumerate(l) ]

d is the numerical difference between the rounded sum and the sum of rounds, this tells us how many places should we change the rounding. If d is zero we clearly have nothing to do. If d is 1 or -1 the best place can be found easily with min or max. For an arbitrary number, we can use heapq.nlargest to find the best D=abs(d) places.

So why is there a max, if nlargest would do?! Because min and max are much more efficiently implemented than that.

This way, the algorithm is O(n+D*log(n)).

Note: With a heap, you can create an O(n+D^2*log(D)) algorithm as well because the top D elements should be on the top D levels of the heap, and you can order that list in O(D^2*log(D)) steps. If n is huge and D is very small this can mean a lot.

(Rights for reconsideration reserved (because it's after midnight).)