SQL Self-join with data comparison for different d

2019-09-20 06:16发布

问题:

I need to compare data on two different days in SQL. And I really need that in a single query since I need to use the results in pagination. Problem is, when I'm doing a self join it's results in duplicate columns since INNER JOIN is a cartesian product.

Here's the code on sql fiddle

E.g.

SELECT * FROM `my_table` as t1 
INNER JOIN my_table t2 ON t1.quality = t2.quality
WHERE (
  t1.day = '2015-01-08' OR t1.day = '2015-01-09' OR 
  t2.day = '2015-01-08' OR t2.day = '2015-01-09'
)

Two questions:

  1. How do I rewrite that go get one row per JOIN on quality?
  2. Will it still work if some corresponding row (on quality) is missing from one of the days?

EDIT

Here's the input:

INSERT INTO my_table
(quality, quantity, day)
VALUES
('A', 1, '2015-01-15'),
('B', 2, '2015-01-15'),
('B', 0, '2015-01-10');

And the desired output is:

QUALITY  | QUANTITY  | T1.QUANTITY - T2.QUANTITY
A           1          0 (or NULL?)
B           2          2

Explanation:

There are unique values on each date. So we see only A and B quality. For B- there is appropriate B on another date. For A - there isn't. Therefore the difference for B is 2 (higher date minus lower). The difference for A is 0 because there's nothing to subtract from.

回答1:

You are getting duplicated columns because as you have it, you are querying from T1 AND T2. So unless you explicitly say show me only T1.* it will grab columns from BOTH table alias references.

Since your query is doing an OR on the dates, you are probably going to get a Cartesian result too.

Now knowing your table structure, but you might be better with an explicit query something like...

SELECT
      t1.day,
      t2.day as OtherDay,
      t1.quality,
      t1.anotherColumn,
      t2.OtherAnotherColumn,
      t1.thirdColumn,
      t2.OtherThirdColumn
   FROM 
      my_table t1
         join my_table t2
            on t1.quality = t2.quality
           AND t2.day = '2015-01-09'
   where
      t1.day = '2015-01-08' 

Have an index on your "my_table" based on (day, quality) to optimize the query. And you can just keep adding in pairs, the columns you are trying to compare between day1 and day2. T1 will only return those associated with the first day, and the T2 alias will only show for matching entries for the second date.

Now, if there are only entries on the T1 side with no corresponding T2 entry for the quality and date in question, but you still want to see those, then just change the JOIN to a LEFT JOIN.