Suppose we take np.dot
of two 'float32'
2D arrays:
res = np.dot(a, b) # see CASE 1
print(list(res[0])) # list shows more digits
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
Numbers. Except, they can change:
CASE 1: slice a
np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')
for i in range(1, len(a)):
print(list(np.dot(a[:i], b)[0])) # full shape: (i, 6)
[-0.9044868, -1.1708502, 0.90713596, 3.5594249, 1.1374012, -1.3826287]
[-0.90448684, -1.1708503, 0.9071359, 3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.9071359, 3.5594249, 1.1374011, -1.3826288]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
[-0.90448684, -1.1708503, 0.907136, 3.5594249, 1.1374011, -1.3826287]
Results differ, even though the printed slice derives from the exact same numbers multiplied.
CASE 2: flatten
a
, take a 1D version of b
, then slice a
:
np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(1, 6).astype('float32')
for i in range(1, len(a)):
a_flat = np.expand_dims(a[:i].flatten(), -1) # keep 2D
print(list(np.dot(a_flat, b)[0])) # full shape: (i*6, 6)
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
[-0.3393164, 0.9528787, 1.3627989, 1.5124314, 0.46389243, 1.437775]
CASE 3: stronger control; set all non-involved entires to zero: add a[1:] = 0
to CASE 1 code. Result: discrepancies persist.
CASE 4: check indices other than [0]
; like for [0]
, results begin to stabilize a fixed # of array enlargements from their point of creation. Output
np.random.seed(1)
a = np.random.randn(9, 6).astype('float32')
b = np.random.randn(6, 6).astype('float32')
for j in range(len(a) - 2):
for i in range(1, len(a)):
res = np.dot(a[:i], b)
try: print(list(res[j]))
except: pass
print()
Hence, for the 2D * 2D case, results differ - but are consistent for 1D * 1D. From some of my readings, this appears to stem from 1D-1D using simple addition, whereas 2D-2D uses 'fancier', performance-boosting addition that may be less precise (e.g. pairwise addition does the opposite). Nonetheless, I'm unable to understand why discrepancies vanish in case 1 once a
is sliced past a set 'threshold'; the larger a
and b
, the later this threshold seems to lie, but it always exists.
All said: why is np.dot
imprecise (and inconsistent) for ND-ND arrays? Relevant Git
Additional info:
- Environment: Win-10 OS, Python 3.7.4, Spyder 3.3.6 IDE, Anaconda 3.0 2019/10
- CPU: i7-7700HQ 2.8 GHz
- Numpy v1.16.5
Possible culprit library: Numpy MKL - also BLASS libraries; thanks to Bi Rico for noting
Stress-test code: as noted, discrepancies exacerbate in frequency w/ larger arrays; if above isn't reproducible, below should be (if not, try larger dims). My output
np.random.seed(1)
a = (0.01*np.random.randn(9, 9999)).astype('float32') # first multiply then type-cast
b = (0.01*np.random.randn(9999, 6)).astype('float32') # *0.01 to bound mults to < 1
for i in range(1, len(a)):
print(list(np.dot(a[:i], b)[0]))
Problem severity: shown discrepancies are 'small', but no longer so when operating on a neural network with billions of numbers multiplied over a few seconds, and trillions over the entire runtime; reported model accuracy differs by entire 10's of percents, per this thread.
Below is a gif of arrays resulting from feeding to a model what's basically a[0]
, w/ len(a)==1
vs. len(a)==32
:
OTHER PLATFORMS results, according and with thanks to Paul's testing:
Case 1 reproduced (partly):
- Google Colab VM -- Intel Xeon 2.3 G-Hz -- Jupyter -- Python 3.6.8
- Win-10 Pro Docker Desktop -- Intel i7-8700K -- jupyter/scipy-notebook -- Python 3.7.3
- Ubuntu 18.04.2 LTS + Docker -- AMD FX-8150 -- jupyter/scipy-notebook -- Python 3.7.3
Note: these yield much lower error than shown above; two entries on the first row are off by 1 in the least significant digit from corresponding entries in the other rows.
Case 1 not reproduced:
- Ubuntu 18.04.3 LTS -- Intel i7-8700K -- IPython 5.5.0 -- Python 2.7.15+ and 3.6.8 (2 tests)
- Ubuntu 18.04.3 LTS -- Intel i5-3320M -- IPython 5.5.0 -- Python 2.7.15+
- Ubuntu 18.04.2 LTS -- AMD FX-8150 -- IPython 5.5.0 -- Python 2.7.15rc1
Notes:
- The linked Colab notebook and jupyter environments show a far lesser discrepancy (and only for first two rows) than is observed on my system. Also, Case 2 never (yet) showed imprecision.
- Within this very limited sample, the current (Dockerized) Jupyter environment is more susceptible than the IPython environment.
np.show_config()
too long to post, but in summary: IPython envs are BLAS/LAPACK-based; Colab is OpenBLAS-based. In IPython Linux envs, BLAS libraries are system-installed -- in Jupyter and Colab, they come from /opt/conda/lib
UPDATE: the accepted answer is accurate, but broad and incomplete. The question remains open for anyone who can explain the behavior at the code level - namely, an exact algorithm used by np.dot
, and how it explains 'consistent inconsistencies' observed in above results (also see comments). Here are some direct implementations beyond my deciphering: sdot.c -- arraytypes.c.src