I would like to multiply each row by a single specified row, and then sum that product for each row (a dot product.) My SQL table is a list of names and associated high-dimensional vectors. The table has 1 string + 1000 numerical columns. There are usually a few million rows. Each float in the vectors/arrays is in a new column:
+--------+------+-------+------+---+--------+
| NAME | COL0 | COL1 | COL2 | … | COL999 |
+--------+------+-------+------+---+--------+
| TOPAZ | 0.73 | 0.77 | 0.15 | | 0.55 |
| GARDEN | 0.41 | 0.57 | 0.61 | | 0.00 |
| HAVE | 0.40 | 0.32 | 0.23 | | 0.52 |
+--------+------+-------+------+---+--------+
In array math, this would simply be sum(table * row, axis=1)
. It's a large array, and takes a few seconds to execute when it's written in C and lives in pure memory, so speed is very important. But it's gotten too big to be in local memory now, and the data needs to live in a SQL server. How do I calculate table * GARDEN in an efficient way such that I can get the dot product of every row with GARDEN (example output below)?
+--------+------+-------+------+---+--------+------+
| NAME | COL0 | COL1 | COL2 | … | COL999 | DOT |
+--------+------+-------+------+---+--------+------+
| TOPAZ | 0.57 | 0.24 | 0.34 | | 0.21 | 0.46 |
| GARDEN | 0.42 | 0.43 | 0.12 | | 0.66 | 0.48 |
| HAVE | 0.31 | 0.37 | 0.30 | | 0.47 | 0.40 |
+--------+------+-------+------+---+--------+------+
Many thanks!
The structure is:
This still won't be particularly fast. "A few seconds" in C# might be many minutes, unless you have parallel hardware and a good SQL database that can take advantage of it.