Use table column value for LIMIT when performing j

2019-08-15 17:10发布

问题:

I have a situation where I'm performing a join between two tables, and I need a value from one table to be used as a LIMIT factor for a subquery in the join. Assume I have the following [extremely simplified] tables -

data:

experiment_id | value
--------------|--------
       1      |  2.5
       1      |  2.6
       1      |  4.5
       1      |  2.3
       1      |  3.5
       1      |  2.8
       2      |  2.3
       2      |  1.2
       2      |  1.1
       2      |  3.6
       2      |  3.8
       2      |  4.1
       2      |  7.9
       2      |  4.2
       2      |  1.0


data_clip:

experiment_id | clip_index
--------------|------------
       1      |     3
       2      |     5

I need to sum each experiment's sorted values up to a certain clip_index, that varies between experiments. So, my result table will ideally look like this:

results:
experiment_id |  sum
--------------|-------
       1      |  7.6    # => 2.3 + 2.5 + 2.8
       2      | 13.0    # => 1.0 + 1.1 + 1.2 + 2.3 + 3.6 + 3.8 

Typically, I'd do this calculation with some client side scripting (ruby, python, etc), but I wanted to try doing this on the db level. Some imaginary SQL might look like this (there's all sorts of wrong with this query, I know, but hopefully you get the idea):

SELECT 
  T0.experiment_id as `id`,
  (SELECT SUM(x.value) from
       (SELECT   value 
        FROM     data 
        WHERE    experiment_id = t0.experiment_id
        ORDER BY value 
        LIMIT    t0.clip_index ) as x) AS `sum`
FROM data_clip AS t0

Several problems:

  1. LIMIT must be defined with a constant (1000, 10, etc.), not a column.
  2. the WHERE condition in the subquery fails for not recognizing t0 table, which is external to the sub-query.

My question is basically how to accomplish the variable limit and sum between the two tables using mostly SQL. I thought about using group_concat and substring_index to isolate the values up to clip_index for each row, but then there's the issue of summing up the numbered strings ("1.2,2.3,3.2") and the server limitation on the size of group_concat buffer (configurable, but values can be around ~100k per experiment). Any thoughts? Thanks.

回答1:

I guess you just need to include a row number with each value selected and limit the results by the number of rows something like this: (not not tested)

SELECT T0.experiment_id as `id`,   
(SELECT SUM(x.value) from       
 (SELECT value,@rownum := @rownum + 1 AS rownum           
  FROM data         
  JOIN (SELECT @rownum := 0) r
  WHERE experiment_id = t0.experiment_id         
  ORDER BY value             
 ) AS x
WHERE x,rownum < t0.clip_index
) AS `sum`
 FROM data_clip AS t0

see: MySQL - Get row number on select



回答2:

I think this will work when all values are positive. If there are negative values, one more level is needed.

SELECT experiment_id
     , MIN(sumValue) - (MIN(cnt)-clip_id) * MIN(maxValue)
       AS sumValue
FROM
  ( SELECT e.experiment_id
         , e.clip_id
         , COUNT(*)         AS cnt
         , SUM(d2.value)    AS sumValue
         , d.value          AS maxValue
    FROM experiment AS e  
      JOIN data AS d
        ON d.experiment_id = e.experiment_id
      JOIN data AS d2
        ON d2.experiment_id = e.experiment_id
        AND d2.value <= d.value
    GROUP BY e.experiment_id
           , d.id                               --- table's `data` Primary Key
    HAVING COUNT(*) >= e.clip_id
  ) AS grp
GROUP BY experiment_id