Prevent strings being truncated when replacing val

2020-03-31 09:45发布

问题:

Lets say I have arrays a and b

a = np.array([1,2,3])
b = np.array(['red','red','red'])

If I were to apply some fancy indexing like this to these arrays

b[a<3]="blue"

the output I get is

array(['blu', 'blu', 'red'], dtype='<U3')

I understand that the issue is because of numpy initially allocating space only for 3 characters at first hence it cant fit the whole word blue into the array, what work around can I use?

Currently I am doing

b = np.array([" "*100 for i in range(3)])
b[a>2] = "red"
b[a<3] = "blue"

but it's just a work around, is this a fault in my code? Or is it some issue with numpy, how can I fix this?

回答1:

You can handle variable length strings by setting the dtype of b to be "object":

import numpy as np
a = np.array([1,2,3])
b = np.array(['red','red','red'], dtype="object")

b[a<3] = "blue"

print(b)

this outputs:

['blue' 'blue' 'red']

This dtype will handle strings, or other general Python objects. This also necessarily means that under the hood you'll have a numpy array of pointers, so don't expect the performance you get when using a primitive datatype.



回答2:

A marginal improvement on your current approach (which is potentially very wasteful in space):

import numpy as np

a = np.array([1,2,3])
b = np.array(['red','red','red'])

replacement = "blue"
b = b.astype('<U{}'.format(max(len(replacement), a.dtype.itemsize)))
b[a<3] = replacement
print(b)

This accounts for strings already in the array, so the allocated space only increases if the replacement is longer than all existing strings in the array.



回答3:

If you construct such array, the type looks like:

>>> b
array(['red', 'red', 'red'], dtype='<U3')

This means that the strings have a length of at most 3 characters. In case you assign longer strings, these strings are truncated.

You can change the data type to make the maximum length longer, for example:

b2 = b.astype('<U10')

So now we have an array that can store strings up to 10 characters. Note however that if you make the maximum length larger, the size of the matrix will increase.