SQL - Find if column dates include at least partia

2019-08-21 13:30发布

问题:

I need to create a report and I am struggling with the SQL script. The table I want to query is a company_status_history table which has entries like the following (the ones that I can't figure out)

Table company_status_history

Columns:

| id | company_id | status_id | effective_date |

Data:

| 1  | 10 | 1 | 2016-12-30 00:00:00.000 |

| 2  | 10 | 5 | 2017-02-04 00:00:00.000 |

| 3  | 11 | 5 | 2017-06-05 00:00:00.000 |

| 4  | 11 | 1 | 2018-04-30 00:00:00.000 |

I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"

Above are the cases that I don't know how to handle since I need to add some logic of type :

  • "If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range."
  • "If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."

回答1:

I think this can be handled as a gaps and islands problem. Consider the following input data: (same as sample data of OP plus two additional rows)

id  company_id  status_id   effective_date
-------------------------------------------
1   10          1           2016-12-15
2   10          1           2016-12-30 
3   10          5           2017-02-04
4   10          4           2017-02-08
5   11          5           2017-06-05
6   11          1           2018-04-30

You can use the following query:

SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
FROM company_status_history AS t
OUTER APPLY 
(
   SELECT COUNT(*) AS cnt
   FROM company_status_history AS c
   WHERE c.status_id = 1 
         AND c.company_id  = t.company_id 
         AND c.effective_date < t.effective_date
) AS x
ORDER BY company_id, effective_date

to get:

id  company_id  status_id   effective_date  grp
-----------------------------------------------
1   10          1           2016-12-15      0
2   10          1           2016-12-30      1
3   10          5           2017-02-04      2
4   10          4           2017-02-08      2
5   11          5           2017-06-05      0
6   11          1           2018-04-30      0

Now you can identify status = 1 islands using:

;WITH CTE AS 
(
    SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
    FROM company_status_history AS t
    OUTER APPLY 
    (
       SELECT COUNT(*) AS cnt
       FROM company_status_history AS c
       WHERE c.status_id = 1 
             AND c.company_id  = t.company_id 
             AND c.effective_date < t.effective_date
    ) AS x
)
SELECT id, company_id, status_id, effective_date,
       ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) - 
       cnt AS grp
FROM CTE 

Output:

id  company_id  status_id   effective_date  grp
-----------------------------------------------
1   10          1           2016-12-15      1
2   10          1           2016-12-30      1
3   10          5           2017-02-04      1
4   10          4           2017-02-08      2
5   11          5           2017-06-05      1
6   11          1           2018-04-30      2

Calculated field grp will help us identify those islands:

;WITH CTE AS 
(
    SELECT t.id, t.company_id, t.status_id, t.effective_date, x.cnt
    FROM company_status_history AS t
    OUTER APPLY 
    (
       SELECT COUNT(*) AS cnt
       FROM company_status_history AS c
       WHERE c.status_id = 1 
             AND c.company_id  = t.company_id 
             AND c.effective_date < t.effective_date
    ) AS x
), CTE2 AS 
(
   SELECT id, company_id, status_id, effective_date,
          ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) - 
          cnt AS grp
   FROM CTE
)
SELECT company_id, 
       MIN(effective_date) AS start_date, 
       CASE 
          WHEN COUNT(*) > 1 THEN DATEADD(DAY, -1, MAX(effective_date))
          ELSE MIN(effective_date)
       END AS end_date
FROM CTE2 
GROUP BY company_id, grp
HAVING COUNT(CASE WHEN status_id = 1 THEN 1 END) > 0

Output:

company_id  start_date  end_date
-----------------------------------
10          2016-12-15  2017-02-03 
11          2018-04-30  2018-04-30 

All you want know is those records from above that overlap with the specified interval.

Demo here with somewhat more complicated use case.



回答2:

Maybe this is what you are looking for? For these kind of questions, you need to join two instance of your table, in this case I am just joining with next record by Id, which probably is not totally correct. To do it better, you can create a new Id using a windowed function like row_number, ordering the table by your requirement criteria

If this row is status 1 and it's date is before the date range check the next row if it has a date inside the date range

declare @range_st date = '2017-01-01'
declare @range_en date = '2017-12-31'

select 
  case
    when csh1.status_id=1 and csh1.effective_date<@range_st 
      then 
        case
          when csh2.effective_date between @range_st and @range_en then true
          else false
        end
    else NULL
  end
from company_status_history csh1
  left join company_status_history csh2
    on csh1.id=csh2.id+1

Implementing second criteria:

"If this row is status 1 and it's date is after the date range check the row before if it has a date inside the date range."

declare @range_st date = '2017-01-01'
declare @range_en date = '2017-12-31'

select 
  case
    when csh1.status_id=1 and csh1.effective_date<@range_st 
      then 
        case
          when csh2.effective_date between @range_st and @range_en then true
          else false
        end
    when csh1.status_id=1 and csh1.effective_date>@range_en 
      then 
        case
          when csh3.effective_date between @range_st and @range_en then true
          else false
        end
    else null -- ¿?
  end
from company_status_history csh1
  left join company_status_history csh2
    on csh1.id=csh2.id+1
  left join company_status_history csh3
    on csh1.id=csh3.id-1


回答3:

I would suggest the use of a cte and the window functions ROW_NUMBER. With this you can find the desired records. An example:

DECLARE @t TABLE(
  id INT
  ,company_id INT
  ,status_id INT
  ,effective_date DATETIME
)

INSERT INTO @t VALUES
(1, 10, 1, '2016-12-30 00:00:00.000')
,(2, 10, 5, '2017-02-04 00:00:00.000')
,(3, 11, 5, '2017-06-05 00:00:00.000')
,(4, 11, 1, '2018-04-30 00:00:00.000')


DECLARE @StartDate DATETIME = '2017-01-01';
DECLARE @EndDate DATETIME = '2017-12-31';

WITH cte AS(
SELECT *
      ,ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY effective_date) AS rn
  FROM @t
),
cteLeadLag AS(
SELECT c.*, ISNULL(c2.effective_date, c.effective_date) LagEffective, ISNULL(c3.effective_date, c.effective_date)LeadEffective
  FROM cte c
  LEFT JOIN cte c2 ON c2.company_id = c.company_id AND c2.rn = c.rn-1
  LEFT JOIN cte c3 ON c3.company_id = c.company_id AND c3.rn = c.rn+1
)
SELECT 'Included' AS RangeStatus, *
  FROM cteLeadLag
  WHERE status_id = 1
    AND effective_date BETWEEN @StartDate AND @EndDate
UNION ALL
SELECT 'Following' AS RangeStatus, *
  FROM cteLeadLag
  WHERE status_id = 1
    AND effective_date > @EndDate
    AND LagEffective BETWEEN @StartDate AND @EndDate
UNION ALL
SELECT 'Trailing' AS RangeStatus, *
  FROM cteLeadLag
  WHERE status_id = 1
    AND effective_date < @EndDate
    AND LeadEffective BETWEEN @StartDate AND @EndDate

I first select all records with their leading and lagging Dates and then I perform your checks on the inclusion in the desired timespan.



回答4:

Try with this, self-explanatory. Responds to this part of your question:

I want to answer to the question "Get all companies that have been at least for some point in status 1 inside the time period 01/01/2017 - 31/12/2017"

  1. Case that you want to find those id's that have been in any moment in status 1 and have records in the period requested:
      SELECT *
FROM company_status_history
WHERE id IN
    ( SELECT Id
     FROM company_status_history
     WHERE status_id=1 )
  AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'
  1. Case that you want to find id's in status 1 and inside the period:
     SELECT *
FROM company_status_history
WHERE status_id=1
  AND effective_date BETWEEN '2017-01-01' AND '2017-12-31'