Finding Median between TWO dates SQL Server 2008

2019-09-04 06:09发布

问题:

I am looking of a way to take the MEDIAN of a bunch of start and end dates (LOTS AND LOTS of dates). However, it would be specific to various "invoice numbers." See sample data below.

 invoice_no    invoice start date      invoice end date
 4006            11/14/2001               12/15/2004
 20071           11/29/2001               02/01/2003
 19893           11/30/2001               12/02/2001
 19894           11/30/2001               12/04/2001
 004             10/22/2002               10/31/2002
 004             12/02/2002               10/31/2002
 004             01/19/2002               10/31/2002
 004             05/10/2002               10/31/2002

Find median between start and end date.

For an invoice that only displays once, the median would just be whatever is between start and end date for that specific invoice_no. However, MANY cases will be the way that invoice '004' are shown. It will repeat many times for different dates - but still the same concept here. Need to find the median between two dates, but still need it to display based on that invoice number.

In an effort to filter data as much as possible. I realized I can also do WHERE STATUS <> 'REJECTED' and it should also help keep a lot of uncessary dates out. Also, I only wanted to filter between a few months so I added the BETWEEN DATETIME in as well.

Code so far (but not working... this sort of logic seems to work if there was 1 date column, but now we're working with two dates so I'm not sure):

 WITH
          tmp AS
    (
        SELECT invoice_no,
        invoice_start_date, invoice_end_date, check_date, status_code,
        cast(count(*) OVER (PARTITION BY invoice_no) as float) AS total,
        row_number() OVER (PARTITION BY invoice_no ORDER BY 
        invoice_start_date, invoice_end_date, check_date) AS rn

FROM INVOICE_HEADER INNER JOIN INVOICE_HEADER_CUSTOM ON INVOICE_HEADER.invoice_id = INVOICE_HEADER_CUSTOM.invoice_id WHERE status_code <> 'REJECTED' AND Check_Date BETWEEN CONVERT(DATETIME, '2014-12-01 00:00:00', 102) AND CONVERT(DATETIME, '2014-12-31 00:00:00', 102) ) SELECT * FROM tmp WHERE (total / 2.0 - 1) < rn and rn < (total / 2.0 + 1)

回答1:

If you mean the set of start dates and end dates, then put them in one column:

WITH t AS (
       SELECT invoice_no, invoice_start_date, invoice_end_date, check_date, status_code,
       FROM INVOICE_HEADER INNER JOIN
            INVOICE_HEADER_CUSTOM
            ON INVOICE_HEADER.invoice_id = INVOICE_HEADER_CUSTOM.invoice_id
       WHERE status_code <> 'REJECTED' AND 
             Check_Date BETWEEN CONVERT(DATETIME, '2014-12-01 00:00:00', 102) AND
             CONVERT(DATETIME, '2014-12-31 00:00:00', 102)
     ), 
     t2 as (
      select d, row_number() over (order by d) as seqnum,
             count(*) over () as cnt
      from (select invoice_start_date as d from t
            union all
            select invoice_end_date as d from t
           ) t
     )
select dateadd(day, datediff(hour, min(d), max(d)) / 2.0, min(d))
from t2
where 2 * seqnum in (cnt, cnt + 1, cnt + 2);


回答2:

Ok... try something like the query on this page:

SELECT @Median = AVG(1.0 * val)
FROM 
(
  SELECT val, 
     c  = COUNT(*) OVER (),
     rn = ROW_NUMBER() OVER (ORDER BY val)
  FROM dbo.EvenRows
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2);