-->

Postgres Left Join with where condition

2019-06-27 11:13发布

问题:

I need to left join two tables with a where condition:

time_table
id     rid        start_date                end_date
1       2     2017-07-01 00:00:00     2018-11-01 00:00:00
2       5     2017-01-01 00:00:00     2017-06-01 00:00:00
3       2     2018-07-01 00:00:00     2020-11-01 00:00:00
record_table
id      name                 date
1      record1       2017-10-01 00:00:00
2      record2       2017-02-01 00:00:00
3      record3       2017-10-01 00:00:00

I need to get all those records which are present under given date range. In the above example, I need those records that lie under range for rid = 2 only. Hence the output for the above query needs to be:

1      record1       2017-10-01 00:00:00    
3      record3       2017-10-01 00:00:00

回答1:

left join two tables with a where condition

There is a trap waiting for you here. When using a LEFT [OUTER] JOIN, it's typically wrong to filter with a WHERE condition, thereby voiding the special feature of a LEFT JOIN to include all rows from the left table unconditionally. Detailed explanation:

  • Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail

So, put conditions into the WHERE clause that are supposed to filter all rows (rid = 2), but make conditions to left-join rows from record_table out to be actual join conditions:

SELECT t.start_date, t.end_date  -- adding those
     , r.id, r.name, r.date 
FROM   time_table t
LEFT   JOIN record_table r ON r.date >= t.start_date
                          AND r.date <  t.end_date
WHERE  t.rid = 2;

As commented, it makes sense to include columns from time_table in the result, but that's my optional addition.

You also need to be very clear about upper and lower bounds. The general convention is to include lower and exclude the upper bound in time (timestamp) ranges. Hence my use of >= and < above.

Related:

  • SQL query on a time series to calculate the average
  • Selecting an average of records grouped by 5 minute periods

Performance should be no problem at all with the right indexes. You need an index (or PK) on time_table on (rid) and another on record_table on (date).



回答2:

SELECT time_tbl.name,record_tbl.date 
FROM dbo.time_table AS time_tbl
     INNER JOIN record_table AS record_tbl
           ON time_tbl.id=record_tbl.id
WHERE(time_tbl.rid=2)       


回答3:

I'm not exactly sure if this is what you want, but if you are saying you want the dates where the record_table date is between the dates in the time_table, then this would do the job:

select
  rt.id, rt.name, rt.date
from
  time_table tt
  join record_table rt on
    rt.date between tt.start_date and tt.end_date
where
  tt.rid = 2

That said, this will be horribly inefficient for large datasets. If your data is relatively small (< 10k records in each table, post-filters), then it probably won't matter much, but if you would need to scale this concept, it would warrant knowing more about your data -- for example, do the dates, always round to the first of each month?

Again, from your example, I wasn't sure if this is what you meant by "get all those records which are present under given date range."