Need help with a complex Join statement in SQL

2019-04-02 01:32发布

问题:

How can you join between a table with a sparse number of dates and another table with an exhaustive number of dates such that the gaps between the sparse dates take the values of the previous sparse date?

Illustrative example:

PRICE table (sparse dates):
date        itemid  price
2008-12-04  1       $1
2008-12-11  1       $3
2008-12-15  1       $7


VOLUME table (exhaustive dates):
date         itemid  volume_amt
2008-12-04   1       12345
2008-12-05   1       23456
2008-12-08   1       34567
2008-12-09   1       ...
2008-12-10   1
2008-12-11   1
2008-12-12   1
2008-12-15   1
2008-12-16   1
2008-12-17   1
2008-12-18   1

Desired result:

date       price  volume_amt
2008-12-04 $1     12345
2008-12-05 $1     23456
2008-12-08 $1     34567
2008-12-09 $1     ...
2008-12-10 $1
2008-12-11 $3
2008-12-12 $3
2008-12-15 $7
2008-12-16 $7
2008-12-17 $7
2008-12-18 $7

Update:

A couple people have suggested a correlated subquery that accomplishes the desired result. (Correlated subquery = a subquery that contains a reference to the outer query.)

This will work; however, I should have noted that the platform I'm using is MySQL, for which correlated subqueries are poorly optimized. Any way to do it without using a correlated subquery?

回答1:

This isn't as simple as a single LEFT OUTER JOIN to the sparse table, because you want the NULLs left by the outer join to be filled with the most recent price.

EXPLAIN SELECT v.`date`, v.volume_amt, p1.item_id, p1.price
FROM Volume v JOIN Price p1
  ON (v.`date` >= p1.`date` AND v.item_id = p1.item_id)
LEFT OUTER JOIN Price p2
  ON (v.`date` >= p2.`date` AND v.item_id = p2.item_id
    AND p1.`date` < p2.`date`)
WHERE p2.item_id IS NULL;

This query matches Volume to all rows in Price that are earlier, and then uses another join to make sure we find only the most recent price.

I tested this on MySQL 5.0.51. It uses neither correlated subqueries nor group by.

edit: Updated the query to match to item_id as well as date. This seems to work too. I created an index on (date) and an index on (date, item_id) and the EXPLAIN plan was identical. An index on (item_id, date) may be better in this case. Here's the EXPLAIN output for that:

+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref             | rows | Extra                                |
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+
|  1 | SIMPLE      | p1    | ALL  | item_id       | NULL    | NULL    | NULL            |    6 |                                      | 
|  1 | SIMPLE      | v     | ref  | item_id       | item_id | 22      | test.p1.item_id |    3 | Using where                          | 
|  1 | SIMPLE      | p2    | ref  | item_id       | item_id | 22      | test.v.item_id  |    1 | Using where; Using index; Not exists | 
+----+-------------+-------+------+---------------+---------+---------+-----------------+------+--------------------------------------+

But I have a very small data set, and the optimization may depend on larger data sets. You should experiment, analyzing the optimization using a larger data set.

edit: I pasted the wrong EXPLAIN output before. The one above is corrected, and shows better use of the (item_id, date) index.



回答2:

Assuming there is only 1 price per date/itemid:

select v.date, v.itemid, p.price
from volume v
join price p on p.itemid = v.item_id
where p.date = (select max(p2.date) from price p2
                where p2.itemid = v.itemid
                and p2.date <= v.date);


回答3:

SELECT v.date, p.price, v.volume
FROM volume v
LEFT JOIN Price p ON p.itemID=v.itemID
    AND p.[date] = (
                    SELECT MAX([date] )
                    FROM price p2 
                    WHERE p2.[date] <= v.[date] AND p2.itemid= v.itemid
                    GROUP BY p2.[date]
                   )


回答4:

 SELECT  Volume.date, volume.itemid, price.price, volume.volume_amt
 FROM Volume 
 LEFT OUTER JOIN Price
 ON Volume.date = Price.date

Probably. My SQL-fu is weak



回答5:

This method works in Oracle. Don't know about other databases, and you didn't specify. If this exact syntax doesn't work in your database, I would guess there are similar techniques.

dev> select * from price;

AS_OF               ID     AMOUNT
----------- ---------- ----------
04-Dec-2008          1          1
11-Dec-2008          1          2
15-Dec-2008          1          3

dev> select * from volume;

DAY                 ID     VOLUME
----------- ---------- ----------
05-Dec-2008          1          1
06-Dec-2008          1          2
07-Dec-2008          1          3
08-Dec-2008          1          4
09-Dec-2008          1          5
10-Dec-2008          1          6
11-Dec-2008          1          7
12-Dec-2008          1          8
13-Dec-2008          1          9
14-Dec-2008          1         10
15-Dec-2008          1         11
16-Dec-2008          1         12
17-Dec-2008          1         13
18-Dec-2008          1         14
19-Dec-2008          1         15
20-Dec-2008          1         16
21-Dec-2008          1         17
22-Dec-2008          1         18
23-Dec-2008          1         19

dev> select day, volume, amount from (
  2    select day, volume, (select max(as_of) from price p where p.id = v.id and as_of <= day) price_as_of
  3      from volume v
  4  )
  5  join price on as_of = price_as_of
  6  order by day;

DAY             VOLUME     AMOUNT
----------- ---------- ----------
05-Dec-2008          1          1
06-Dec-2008          2          1
07-Dec-2008          3          1
08-Dec-2008          4          1
09-Dec-2008          5          1
10-Dec-2008          6          1
11-Dec-2008          7          2
12-Dec-2008          8          2
13-Dec-2008          9          2
14-Dec-2008         10          2
15-Dec-2008         11          3
16-Dec-2008         12          3
17-Dec-2008         13          3
18-Dec-2008         14          3
19-Dec-2008         15          3
20-Dec-2008         16          3
21-Dec-2008         17          3
22-Dec-2008         18          3
23-Dec-2008         19          3