Vectorized operations on cell arrays

2019-06-16 20:19发布

问题:

This post was triggered by following discussion on whether cell arrays are "normal arrays" and that vectorizaton does not work for cell arrays.

I wonder why following vectorization syntax is not implemented in MATLAB, what speaks against it:

>> {'hallo','matlab','world'} == 'matlab'
??? Undefined function or method 'eq' for input arguments of type 'cell'.

internally it would be equivalent to

[{'hallo'},{'matlab'},{'world'}] == {'matlab'}

because MATLAB knows when to cast, following works:

[{'hallo','matlab'},'world']

Cell array is an array of pointers. If both left and right side point to equal objects, isequal('hallo','hallo') returns as expected true, then why MATLAB still does not allow topmost example?

I know I can use strmatch or cellfun.

SUMMARY:

  • operator == which is required for vectorization in above example is eq and not isequal (other operators are < which is lt, etc.)
  • eq is built-in for numeric types, for all other types (like strings) MATLAB gives as freedom to overload this (and other) operators.
  • operator vectorization is thus well possible with cell arrays of defined type (like string) but not by default for any type.
  • function vectorization like myFun( myString ) or myFun( myCellOfStrings ), is also possible, you have just to implement it internally in myFun. Functions sin(val) and sin(array) work also not by witchcraft but because both cases are implemented internally.

回答1:

Firstly, == is not the same as isequal. The function that gets called when you use == is eq, and the scope of each of those is different.

For e.g., in eq(A,B), if B is a scalar, the function checks each element of A for equality with B and returns a logical vector.

eq([2,5,4,2],2)

ans =

     1     0     0     1

However, isequal(A,B) checks if A is identically equal to B in all aspects. In other words, MATLAB cannot tell the difference between A and B. Doing this for the above example:

isequal([2,5,4,2],2)

ans =

     0

I think what you really intended to ask in the question, but didn't, is:

"Why is == not defined for cell arrays?"

Well, a simple reason is: Cells were not intended for such use. You can easily see how implementing such a function for cells can quickly get complicated when you start considering individual cases. For example, consider

{2,5,{4,2}}==2

What would you expect the answer to be? A reasonable guess would be

ans = {1,0,0}

which is fair. But let's say, I disagree. Now I'd like the equality operation to walk down nested cells and return

ans = {1,0,{0,1}}

Can you disagree with this interpretation? Perhaps not. It's equally valid, and in some cases that's the behavior you want.

This was just a simple example. Now add to this a mixture of nested cells, different types, etc. within the cell and think about handling each of those corner cases. It quickly becomes a nightmare for the developers to implement such a functionality that can be satisfactorily used by everyone.

So the solution is to overload the function, implementing only the specific functionality that you desire, for use in your application. MATLAB provides a way to do that too, by creating an @cell directory and defining an eq.m for use with cells the way you want it. Ramashalanka has demonstrated this in his answer.



回答2:

There are many things that would seem natural for MATLAB to do that they have chosen not to. Perhaps they don't want to consider many special cases (see below). You can do it yourself by overloading. If you make a directory @cell and put the following in a new function eq.m:

function c = eq(a,b)
if iscell(b) && ~iscell(a)
    c = eq(b,a);
else
    c = cell(size(a));
    for n = 1:numel(c)
        if iscell(a) && iscell(b)
            c{n} = isequal(a{n},b{n});
        else
            c{n} = isequal(a{n},b);
        end
    end
end

Then you can do, e.g.:

>> {'hallo','matlab','world'} == 'matlab'
ans =     [0]    [1]    [0]

>> {'hallo','matlab','world'} == {'a','matlab','b'}
ans =     [0]    [1]    [0]

>> {'hallo','matlab','world'} == {'a','dd','matlab'}
ans =     [0]    [0]    [0]

>>  { 1, 2, 3 } == 2
ans =     [0]    [1]    [0]

But, even though I considered a couple of cases in my simple function, there are lots of things I didn't consider (checking cells are the same size, checking a multi-element cell against a singleton etc etc).

I used isequal even though it's called with eq (i.e. ==) since it handles {'hallo','matlab','world'} == 'matlab' better, but really I should consider more cases.

(EDIT: I made the function slightly shorter, but less efficient)



回答3:

The reason for this problem is: cell arrays can store different types of variables in different cells. Thus the operator == can't be defined well for the entire array. It is even possible for a cell to contain another cell, further exacerbating the problem.

Think of {4,'4',4.0,{4,4,4}} == '4'. what should be the result? Each type evaluates in different way.



回答4:

This is not unique to strings. Even the following does not work:

{ 1, 2, 3 } == 2

Cell arrays are not the same as "normal" arrays: they offer different syntax, different semantics, different capabilities, and are implemented differently (an extra layer of indirection).

Consider if == on cell arrays were defined in terms of isequal on an element-by-element basis. So, the above example would be no problem. But what about this?

{ [1 0 1], [1 1 0] } == 1

The resulting behaviour wouldn't be terribly useful in most circumstances. And what about this?

1 == { 1, 2, 3 }

And how would you define this? (I can think of at least three different interpretations.)

{ 1, 2, 3 } == { 4, 5, 6 }

And what about this?

{ 1, 2, 3 } == { { 4, 5, 6 } }

Or this?

{ 1, 2, 3 } == { 4; 5; 6 }

Or this?

{ 1, 2, 3 } == { 4, 5 }

You could add all sorts of special-case handling, but that makes the language less consistent, more complex, and less predictable.