Dot product in an SQL table with many columns

2019-06-28 00:51发布

问题:

I would like to multiply each row by a single specified row, and then sum that product for each row (a dot product.) My SQL table is a list of names and associated high-dimensional vectors. The table has 1 string + 1000 numerical columns. There are usually a few million rows. Each float in the vectors/arrays is in a new column:

+--------+------+-------+------+---+--------+
|  NAME  | COL0 | COL1  | COL2 | … | COL999 |
+--------+------+-------+------+---+--------+
| TOPAZ  | 0.73 | 0.77  | 0.15 |   | 0.55   |
| GARDEN | 0.41 | 0.57  | 0.61 |   | 0.00   |
| HAVE   | 0.40 | 0.32  | 0.23 |   | 0.52   |
+--------+------+-------+------+---+--------+

In array math, this would simply be sum(table * row, axis=1). It's a large array, and takes a few seconds to execute when it's written in C and lives in pure memory, so speed is very important. But it's gotten too big to be in local memory now, and the data needs to live in a SQL server. How do I calculate table * GARDEN in an efficient way such that I can get the dot product of every row with GARDEN (example output below)?

+--------+------+-------+------+---+--------+------+
|  NAME  | COL0 | COL1  | COL2 | … | COL999 | DOT  |
+--------+------+-------+------+---+--------+------+
| TOPAZ  | 0.57 | 0.24  | 0.34 |   | 0.21   | 0.46 |
| GARDEN | 0.42 | 0.43  | 0.12 |   | 0.66   | 0.48 |
| HAVE   | 0.31 | 0.37  | 0.30 |   | 0.47   | 0.40 |
+--------+------+-------+------+---+--------+------+

Many thanks!

回答1:

The structure is:

select t.*,
       (t.col0 * garden.col0 +
        t.col1 * garden.col1 + . . .
        t.col999 * garden.col999
       ) as DOT
from t cross join
     (select t.*
      from t
      where name = 'GARDEN'
     ) garden;

This still won't be particularly fast. "A few seconds" in C# might be many minutes, unless you have parallel hardware and a good SQL database that can take advantage of it.