Einsum optimize fails for basic operation

With the recent update to Numpy (1.14), I found that it breaks my entire codebase. This is based on changing the default numpy einsum optimize argument from False to True.

As a result, the following basic operation now fails:

a = np.random.random([50, 2, 2])
b = np.random.random([50, 2])

np.einsum('bdc, ac -> ab', a, b, optimize=True)

with the following error trace:

ValueError                                Traceback (most recent call last)
<ipython-input-71-b0f9ce3c71a3> in <module>()
----> 1 np.einsum('bdc, ac -> ab', a, b, optimize=True)

C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\einsumfunc.py in 
einsum(*operands, **kwargs)
   1118 
   1119             # Contract!
-> 1120             new_view = tensordot(*tmp_operands, axes=
(tuple(left_pos), tuple(right_pos)))
   1121 
   1122             # Build a new view if needed

C:\ProgramData\Anaconda3\lib\site-packages\numpy\core\numeric.py in 
tensordot(a, b, axes)
   1301     oldb = [bs[axis] for axis in notin]
   1302 
-> 1303     at = a.transpose(newaxes_a).reshape(newshape_a)
   1304     bt = b.transpose(newaxes_b).reshape(newshape_b)
   1305     res = dot(at, bt)

 ValueError: axes don't match array

The operation I'm requesting from einsum seems very simple... so why does it fail? If I set "optimize=False", it works just fine.

I tried exploring with einsum_path but the resulting path info was the same with and without optimization.

In [40]: a=np.ones((50,2,2),int); b=np.ones((50,2),int)
In [41]: np.einsum('bdc,ac->ab', a, b)
... 
ValueError: axes don't match array

I don't see what optimization has to do with this error.

For the first argument b,d,c are 50,2,2. For the second a,c are 50,2. The result should be 50,50. But what happened to d?

In [43]: np.einsum('bdc,ac->abd', a, b).shape
Out[43]: (50, 50, 2)

Oops:

In [49]: np.einsum('bdc,ac->ab', a, b, optimize=False).shape
Out[49]: (50, 50)

So it's summing on d.

Note the error - with optimization, it uses tensordot (transpose plus dot), rather than the original einsum nditer approach.

c_einsum is the one that can handle that missing d:

# If no optimization, run pure einsum
if optimize_arg is False:
    return c_einsum(*operands, **kwargs)

Tried some timings:

Two step calculation with the default optimize:

In [64]: timeit np.einsum('abd->ab', np.einsum('bdc,ac->abd', a, b))
288 µs ± 518 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

The desired c_einsum is faster

In [65]: timeit np.einsum('bdc,ac->ab', a, b, optimize=False)
170 µs ± 83.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In fact c_einsum is faster even when the tensordot version works

In [67]: timeit np.einsum('bdc,ac->abd', a, b,optimize=False)
73.1 µs ± 46.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [68]: timeit np.einsum('bdc,ac->abd', a, b,optimize=True)
207 µs ± 6.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This example might be too small to show of the advantages of tensordot/blas.

Looks like this has been raised on github - both the failures, and the slower 'optimize' speed: https://github.com/numpy/numpy/issues/10343 "einsum broadcast regression (with optimize=True)"