What are the semantics of 'end' in Matlab?

2019-01-09 05:32发布

问题:

It's common to use the end keyword as a shortcut for accessing or extending an array in Matlab, as in

>> x = [1,2,3];
>> x(1:end-1)
ans =
    1   2
>> x(end+1) = 4
x =
    1   2   3   4

However, I was surprised to find that the following also works

>> x(1:min(5, end))
ans =
    1   2   3   4

I thought that end might be a special form, like :, that can be special-cased in indexing operations, so I created a class to detect this

classdef IndexDisplayer
  methods
    function subsref(self, s)
      disp(s);
    end
  end
end

You can see how : is special cased in the following example

>> a = IndexDisplayer;
>> a(1:3)
    type: '()'
    subs: {[1 2 3]}
>> a(:)
    type: '()'
    subs: {':'}

However, when I index with end I just see

>> a(end)
    type: '()'
    subs: {[1]}

Here the end is replaced with a 1. Where does that 1 come from? My first guess was that any end inside an indexing expression x(end) would be replaced with a call to length(x) so I tried overriding length as well

classdef IndexDisplayer
  methods
    function subsref(self, s)
      disp(s);
    end
    function len = length(self)
      len = 10;
    end
  end
end

However, that gives

>> a = IndexDisplayer;
>> length(a)
ans =
    10
>> a(end)
    type: '()'
    subs: {[1]}

so that theory is out the window. Can anyone explain the semantics of end?

回答1:

Firstly, I think it's kind of a bug, or at least an unexpected feature, that your syntax x(1:min(5, end)) works at all. When I was at MathWorks, I remember someone pointing this out, and quite a few of the developers had to spend a while figuring out what was going on. I'm not sure if they ever really agreed whether it was a problem or not.

To explain the (intended) semantics of end: end is implemented as a function ind = end(obj, k, n). k is the index of the expression containing end, and n is the total number of indices in the expression.

So, for example, when you call a(1,end,1), k is 2, as the end is in argument 2, and n is 3 as there are 3 arguments.

ind is returned as the index that can replace end in the expression.

You can overload end for your own classes (in the same way as you can overload colon, size, subsref etc).

To extend your example:

classdef IndexDisplayer
  methods
    function ind = end(self,k,n)
        disp(k)
        disp(n)
        ind = builtin('end', self, k, n);
    end
  end
end

>> a = IndexDisplayer;
>> a(1,end,1)
 2
 3

See here for more information.



回答2:

I find this a curiosity too. Nevertheless, I often use (exploit?) this behavior to shorten statements. For example, in this answer, to get all but the kth element(s) of a vector, a clean solution that occurred to me was,

vector(setdiff(1:end,k))

This end replaces a call to numel(vector). For a scalar k, this is an alternative to vector(1:end ~= k) or vector([1:k-1 k+1:end]). It seemed perfectly reasonable at the time, although I drew attention to the oddity of this usage. Is this really bad practice? Perhaps, but I've accepted it for what it's worth and move on.

I don't offer any insight into how this works or what the rules are, as Sam Roberts does in his answer, but conceptually, I see this as a matter of context. That is, when end occurs, I would assume it evaluates to an index (or dimension subscript) for the array with the most immediate scope, looking "up" through nested statements to make the determination. Not sure if that is the right wording, but it seems to be a useful way to interpret the operation of end.

I haven't been bitten by this interpretation yet.