I have a situation where I'm performing a join between two tables, and I need a value from one table to be used as a LIMIT factor for a subquery in the join. Assume I have the following [extremely simplified] tables -
data:
experiment_id | value
--------------|--------
1 | 2.5
1 | 2.6
1 | 4.5
1 | 2.3
1 | 3.5
1 | 2.8
2 | 2.3
2 | 1.2
2 | 1.1
2 | 3.6
2 | 3.8
2 | 4.1
2 | 7.9
2 | 4.2
2 | 1.0
data_clip:
experiment_id | clip_index
--------------|------------
1 | 3
2 | 5
I need to sum each experiment's sorted values up to a certain clip_index, that varies between experiments. So, my result table will ideally look like this:
results:
experiment_id | sum
--------------|-------
1 | 7.6 # => 2.3 + 2.5 + 2.8
2 | 13.0 # => 1.0 + 1.1 + 1.2 + 2.3 + 3.6 + 3.8
Typically, I'd do this calculation with some client side scripting (ruby, python, etc), but I wanted to try doing this on the db level. Some imaginary SQL might look like this (there's all sorts of wrong with this query, I know, but hopefully you get the idea):
SELECT
T0.experiment_id as `id`,
(SELECT SUM(x.value) from
(SELECT value
FROM data
WHERE experiment_id = t0.experiment_id
ORDER BY value
LIMIT t0.clip_index ) as x) AS `sum`
FROM data_clip AS t0
Several problems:
- LIMIT must be defined with a constant (1000, 10, etc.), not a column.
- the
WHERE
condition in the subquery fails for not recognizingt0
table, which is external to the sub-query.
My question is basically how to accomplish the variable limit and sum between the two tables using mostly SQL. I thought about using group_concat
and substring_index
to isolate the values up to clip_index
for each row, but then there's the issue of summing up the numbered strings ("1.2,2.3,3.2"
) and the server limitation on the size of group_concat
buffer (configurable, but values can be around ~100k per experiment). Any thoughts? Thanks.