I am using np.einsum
to multiply probability tables like:
np.einsum('ijk,jklm->ijklm', A, B)
The issue is that I am dealing with more than 26 random variables (axes) overall, so if I assign each random variable a letter I run out of letters. Is there another way I can specify the above operation to avoid this issue, without resorting to a mess of np.sum
and np.dot
operations?
If you are talking about the letters
ijk
in your example and having more then the available alphabetic characters, then no you can't.In the einsum numpy code here and here numpy is checking each character one by one with
isalpha
and there seems to be no way to create names with more than 1 character.Maybe you can use capital letters, but the main answer to the question is that you cannot have names for the axes with more than 1 character.
You could use the
einsum(op0, sublist0, op1, sublist1, ..., [sublistout])
form instead ofi,j,ik->ijk
, which the API is not restricted to 52 axes*. How this verbose form corresponds to the ijk form are shown in the documentation.OP's
would be written as
(* Note: The implementation is still restricted to 26 axes. See @hpaulj's answer and his bug report for explanation)
Equivalences from numpy's examples:
The short answer is, you can use any of the 52 letters (upper and lower). That's all the letters in the English language. Any fancier axes names will have to be mapped on those 52, or an equivalent set of numbers. Practically speaking you will want to use a fraction of those 52 in any one
einsum
call.@kennytm
suggests using the alternative input syntax. A few sample runs suggests that this is not a solution. 26 is still the practical limit (despite the suspicious error messages).I'm not entirely sure why you need more than 52 letters (upper and lower case), but I'm sure you need to do some sort of mapping. You don't want to write an
einsum
string using more than 52 axes all at once. The resulting iterator would be too large (for memory or time).I'm picturing some sort of mapping function that can be used as:
https://github.com/hpaulj/numpy-einsum/blob/master/einsum_py.py
is a Python version of
einsum
. Crudely speakingeinsum
parses the subscripts string, creating anop_axes
list that can be used innp.nditer
to set up the required sum-of-products calculation. With this code I can look at how the translation is done:From an example in the
__name__
block:Your example, with full diagnostic output is
Using
'ajk,jkzZ->ajkzZ'
changes labels, but results in the sameop_axes
.Here is a first draft of a translation function. It should work for any list of lists (of hashable items):
The use of
set
to map index objects means that the final indexing characters are unordered. As long as you specify the RHS that shouldn't be an issue. Also I ignoredellipsis
.=================
The list version of
einsum
input is converted to the subscript string version ineinsum_list_to_subscripts()
(innumpy/core/src/multiarray/multiarraymodule.c
). It replaceELLIPSIS
with '...'. It raised the [0,52] error message if( s < 0 || s > 2*26)
wheres
is a number in one of those sublists. And convertss
to string withBut it looks like the 2nd case is not working; I get errors like for 26.
That
'a'+s
is wrong ifs>26
:That
'a'+s
is wrong; is should be:I submitted https://github.com/numpy/numpy/issues/7741
The existence of this bug after all this time indicates that the sublist format is not common, and that using large numbers in that list is even less frequent.