Mathematica's pattern matching poorly optimize

2019-03-24 17:32发布

问题:

I recently inquired about why PatternTest was causing a multitude of needless evaluations: PatternTest not optimized? Leonid replied that it is necessary for what seems to me as a rather questionable method. I can accept that, though I would prefer a more efficient alternative.

I now realize, which I believe Leonid has been saying for some time, that this problem runs much deeper in Mathematica, and I am troubled. I cannot understand why this is not or cannot be better optimized.

Consider this example:

list = RandomReal[9, 20000];
Head /@ list; // Timing
MatchQ[list, {x__Integer, y__}] // Timing
{0., Null}
{1.014, False}

Checking the heads of the list is essentially instantaneous, yet checking the pattern takes over a second. Surely Mathematica could recognize that since the first element of the list is not an Integer, the pattern cannot match, and unlike the case with PatternTest I cannot see how there is any mutability in the pattern. What is the explanation for this?


There appears to be some confusion regarding packed arrays, which as far as I can tell have no bearing on this question. Rather, I am concerned with the O(n2) time complexity on all lists, packed or unpacked.

回答1:

MatchQ unpacks for these kinds of tests. The reason is that no special case for this has been implemented. In principle it could contain anything.

On["Packing"]
MatchQ[list, {x_Integer, y__}] // Timing

MatchQ[list, {x__Integer, y__}] // Timing

Improving this is very tricky - if you break the pattern matcher you have a serious problem.

Edit 1: It is true that the unpacking is not the cause for the O(n^2) complexity. It does, however, show that for the MatchQ[list, {x__Integer, y__}] part the code goes to another part of the algorithm (which needs the lists to be unpacked). Some other things to note: This complexity arises only if both patterns are __ if either one of them is _ the algorithm has a better complexity.

The algorithm then goes through all n*n potential matches and there seems no early bailout. Presumably because other patters could be constructed that would need this complexity - The issue is that the above pattern forces the matcher to a very general algorithm.

I then was hoping for MatchQ[list, {Shortest[x__Integer], __}] and friends but to no avail.

So, my two cents: either use a different pattern (and have On["Packing"] to see if it goes to the general matcher) or do a pre-check DeveloperPackedArrayQ[expr] && Head[expr[[1]]]===Integer or some such.



回答2:

@the author of the first answer. As far as I know from reverse-engeneering and reading of available information, it may be due to different ways the patterns are checked. In fact - as they say - a special hash code is used for pattern matching. This hash (basically a FNV-1 round) makes it very easy to check for particular patterns related to the type of expression involved (matter of a few xor operations). The hashing algorithm cycles inside the expression and each subpart is xorred with the output of the previous one. Special xor values are used for each atom expression - machineInts, machineReals, bigNums, Rationals and so on. Hence, for example, _Integer is easy to check because the hash of any integer is formed with integer's xor value, so all we need to do is doing the inverse op and see if matches - i.e. if we get some particular value or something like that (sorry if I'm vague on actual implementation details. It's WIP). For general or uncommon patterns the check may not take advantage of this hash stuff and require something different.

@the OP Head[] simply acts on the internal expression, taking the value of the first pointer of the expression (expressions are implemented as arrays of pointers). So doing it is as easy as copying and printing a string - very very fast. The pattern matching engine is not even called in this case.