Maintaining numpy subclass inside a container afte

2019-09-02 09:58发布

问题:

I have created a class derived from numpy's ndarray following numpy's documentation, and it looks like (reduce the number of attributes to make it more readable):

import numpy as np

class Atom3D( np.ndarray ):
    __array_priority__ = 11.0

    def __new__( cls, idnum, coordinates):

        # Cast numpy to be our class type
        assert len(coordinates) == 3
        obj = np.asarray(coordinates, dtype= np.float64).view(cls)
        # add the new attribute to the created instance
        obj._number = int(idnum)
        # Finally, we must return the newly created object:
        return obj

    def __array_finalize__( self, obj ):
        self._number = getattr(obj, '_number', None)

    def __array_wrap__( self, out_arr, context=None ):
        return np.ndarray.__array_wrap__(self, out_arr, context)

    def __repr__( self ):
        return "{0._number}: ({0[0]:8.3f}, {0[1]:8.3f}, {0[2]:9.3f})".format(self)

When I execute a test in which I apply a numpy's ufunc to the object:

a1 = Atom3D(1, [5., 5., 5.])
print type(a1), repr(a1)
m  = np.identity(3)
a2 = np.dot(a1, m)
print type(a2), repr(a2)

I obtain the expected result; that is, the dot function keeps the subclassing of the object:

<class '__main__.Atom3D'> 1: (   5.000,    5.000,     5.000)  
<class '__main__.Atom3D'> 1: (   5.000,    5.000,     5.000)

But, when I try to apply the same np.dot to an array of these objects, the subclass is lost. Thus, executing:

print "regular"
atom_list1 = [a1, a2, a3]
atom_list2 = np.dot(atom_list1, m)
for _ in atom_list2:
    print type(_), repr(_)

print "numpy array"
atom_list1 = np.array([a1, a2, a3], dtype=np.object)
atom_list2 = np.dot(atom_list1, m)
for _ in atom_list2:
    print type(_), repr(_)

Gives me this:

regular
<type 'numpy.ndarray'> array([ 5.,  5.,  5.])
<type 'numpy.ndarray'> array([ 6.,  4.,  2.])
<type 'numpy.ndarray'> array([ 8.,  6.,  8.])
numpy array
<type 'numpy.ndarray'> array([5.0, 5.0, 5.0], dtype=object)
<type 'numpy.ndarray'> array([6.0, 4.0, 2.0], dtype=object)
<type 'numpy.ndarray'> array([8.0, 6.0, 8.0], dtype=object)

The same will go for other operation such as __sub__:

print "regular"
a1 = Atom3D(1, [5., 5., 5.])
a2 = a1 - np.array([3., 2., 0.])
print type(a2), repr(a2)
print "numpy array"
a1 = Atom3D(1, [5., 5., 5.])
a2 = Atom3D(2, [6., 4., 2.])
a3 = Atom3D(3, [8., 6., 8.])
atom_list1 = np.array([a1, a2, a3], dtype=np.object)
atom_list2 = atom_list1 - np.array([3., 2., 0.])
for _ in atom_list2:
    print type(_), repr(_)

Will yield:

regular
<class '__main__.Atom3D'> 1: (   2.000,    3.000,     5.000)
numpy array
<type 'numpy.ndarray'> array([2.0, 3.0, 5.0], dtype=object)
<type 'numpy.ndarray'> array([3.0, 2.0, 2.0], dtype=object)
<type 'numpy.ndarray'> array([5.0, 4.0, 8.0], dtype=object)

I've been looking but have not found how I'm I going wrong.
Thanks!!

J.-

回答1:

There's no such thing as dtype=Atom3D. The same goes for dtype=list and dtype=np.ndarray. It creates a dtype=object array, where each element is a pointer to objects elsewhere in memory.

Creating an object array with np.array(...) can be tricky. np.array evaluates the entries and makes some of its own choices. The best bet, if you want absolute control over the elements that go into an object array is to create an 'blank' one, and assign elements yourself.

In [508]: A=np.array([np.matrix([1,2]),np.matrix([2,1])],dtype=object)

In [509]: A       # a 3d array, no matrix subarrays
Out[509]: 
array([[[1, 2]],

       [[2, 1]]], dtype=object)

In [510]: A=np.empty((2,),dtype=object)

In [511]: A
Out[511]: array([None, None], dtype=object)

In [512]: A[:]=[np.matrix([1,2]),np.matrix([2,1])]

In [513]: A
Out[513]: array([matrix([[1, 2]]), matrix([[2, 1]])], dtype=object)

Unless you really need the object array for things like reshaping and transposing, you are usually better off working with a list.

Mixing the object types also works:

In [522]: A=np.asarray([np.matrix([1,2]),np.ma.masked_array([2,1])],dtype=np.object)

In [523]: A
Out[523]: 
array([matrix([[1, 2]]),
       masked_array(data = [2 1],
             mask = False,
       fill_value = 999999)
], dtype=object)

==========================

When you do np.dot([a1,a2,a3],m), it first turns any lists into an array with np.asarray([a1,a2,a3]). The result is a 2d array, not an array of Atom3d objects. So the dot is the usual array dot.

If I create an object array as I suggested:

In [14]: A=np.empty((3,),dtype=object)
In [16]: A[:]=[a1,a2,a1+a2]

In [17]: A
Out[17]: 
array([1: (   5.000,    5.000,     5.000),
       1: (   5.000,    5.000,     5.000),
       1: (  10.000,   10.000,    10.000)], dtype=object)

In [18]: np.dot(A,m)
Out[18]: 
array([1: (   5.000,    5.000,     5.000),
       1: (   5.000,    5.000,     5.000),
       1: (  10.000,   10.000,    10.000)], dtype=object)

Atom3D type is preserved;

Same for subtraction:

In [23]: A- np.array([3.,2., 0])
Out[23]: 
array([1: (   2.000,    2.000,     2.000),
       1: (   3.000,    3.000,     3.000),
       1: (  10.000,   10.000,    10.000)], dtype=object)

Addition of that array and an Atom3D works, though there is something wrong with displaying the result:

In [39]: x = A + a2

In [40]: x
Out[40]: <repr(<__main__.Atom3D at 0xb5062294>) failed: TypeError: non-empty format string passed to object.__format__>

Calculations with object dtype arrays are iffy. Some work, apparently by iterating over the elements of the array, applying the function, and turning the result back into an object array. In effect an array version of

 [func(a, x) for a in A]

Even if it works, it isn't performing a fast compiled operation; it is iterative (timings will be similar to the list equivalent).

Other things don't work

In [41]: a1>0
Out[41]: 1: (   1.000,    1.000,     1.000)

In [42]: A>0
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

We've pointed out many times that object dtype arrays are little more than glorified lists. The elements are pointers as with lists, and thus operations will involve iterating over those pointers - in Python, not C. This is not a highly developed corner of numpy code.