Numpy: Check if float array contains whole numbers

2020-06-14 09:09发布

问题:

In Python, it is possible to check if a float contains an integer value using n.is_integer(), based on this QA: How to check if a float value is a whole number.

Does numpy have a similar operation that can be applied to arrays? Something that would allow the following:

>>> x = np.array([1.0 2.1 3.0 3.9])
>>> mask = np.is_integer(x)
>>> mask
array([True, False, True, False], dtype=bool)

It is possible to do something like

>>> mask = (x == np.floor(x))

or

>>> mask = (x == np.round(x))

but they involve calling extra methods and creating a bunch of temp arrays that could be potentially avoided.

Does numpy have a vectorized function that checks for fractional parts of floats in a way similar to Python's float.is_integer?

回答1:

From what I can tell, there is no such function that returns a boolean array indicating whether floats have a fractional part or not. The closest I can find is np.modf which returns the fractional and integer parts, but that creates two float arrays (at least temporarily), so it might not be best memory-wise.

If you're happy working in place, you can try something like:

>>> np.mod(x, 1, out=x)
>>> mask = (x == 0)

This should save memory versus using round or floor (where you have to keep x around), but of course you lose the original x.

The other option is to ask for it to be implemented in Numpy, or implement it yourself.



回答2:

I needed an answer to this question for a slightly different reason: checking when I can convert an entire array of floating point numbers to integers without losing data.

Hunse's answer almost works for me, except that I obviously can't use the in-place trick, since I need to be able to undo the operation:

if np.all(np.mod(x, 1) == 0):
    x = x.astype(int)

From there, I thought of the following option which probably is faster in many situations:

x_int = x.astype(int)
if np.all((x - x_int) == 0):
    x = x_int

The reason is that the modulo operation is slower than subtraction. However, now we do the casting to integers up-front - I don't know how fast that operation is, relatively speaking. But if most of your arrays are integers (they are in my case), the latter version is almost certainly faster.

Another benefit is that you could replace the subraction with something like np.isclose to check within a certain tolerance (of course you should be careful here, since truncation is not proper rounding!).

x_int = x.astype(int)
if np.all(np.isclose(x, x_int, 0.0001)):
    x = x_int

EDIT: Slower, but perhaps worth it depending on your use-case, is also converting integers individually if present.

x_int = x.astype(int)
safe_conversion = (x - x_int) == 0
# if we can convert the whole array to integers, do that
if np.all(safe_conversion):
    x = x_int.tolist()
else:
    x  = x.tolist()
    # if there are _some_ integers, convert them
    if np.any(safe_conversion):
        for i in range(len(x)):
            if safe_conversion[i]:
                x[i] = int(x[i])

As an example of where this matters: this works out for me, because I have sparse data (which means mostly zeros) which I then convert to JSON, once, and reuse later on a server. For floats, ujson converts those as [ ...,0.0,0.0,0.0,... ], and for ints that results in [...,0,0,0,...], saving up to half the numbers of characters in the string. This reduces overhead on both the server (shorter strings) and the client (shorter strings, presumably slightly faster JSON parsing).