SQL for Dates with no ClosePrice for all companies

2019-09-19 21:25发布

问题:

I have a fact table where I have ClosePrice for Company Tickers and TradingDates. I also have a dimCalendar table where all dates are there and a flag is maintained for Trading Holidays.

Somehow I need to have a new fact table or materialized view or something for reporting purposes where I need to have ClosePrice for every day including TradingHolidays.

ClosePrice for trading holidays should be previous trading day's close price for that CompanyTicker. Then I need to have another column for '5 day average Close Price'. For average calculation close price for trading days should only be included.

So, let's assume this is Current State.State1

And below is After joining with Calendar.

I guess 1st step is to Left Join Calendar Table. That gives me 0 Stock price for missing Days.

select a.date as tdate, a.datekey, b.ticker, coalesce(b.ClosePrice,0)
from dimdates a left join 
    factStockDividendCommodity b
    on a.DateKey = b.datekey --and b.ticker = 'BP'
where (a.Datekey between 20180101 and 20181231 )
order by a.Date

Not Sure how I can get the Ticker and then Previous Trading Day's close price.

回答1:

The key idea is to generate the rows using a cross join and then fill in the values. In your case, you probably want to take into account that stocks may not exist at all points in the past, so you only want this for the minimum observed date.

To fill in the date,you can use lag(ignore nulls) in standard SQL:

select d.date as tdate, d.datekey, t.ticker, 
       coalesce(fsdc.ClosePrice,
                lag(fsdc.ClosePrice ignore nulls) over (partition by t.ticker order by d.date) as ClosePrice
from dimdates d join
     (select ticker, min(datekey) as min_date
      from factStockDividendCommodity fsdc
      group by ticker
     ) t
     on d.datekey >= t.min_datekey left join
     factStockDividendCommodity fsdc
     on fsdc.ticker = t.ticker and
        fsdc.datekey = d.datekey
where d.Datekey between 20180101 and 20181231
order by d.Date;

Alas, many databases -- even those that support lag() -- do not support the ignore nulls option. The best approach then depends on the database. A correlated subquery is the most general method, but perhaps not the best from a performance perspective.

EDIT:

SQL Server does not support the IGNORE NULLS option. This is probably most easily handled using OUTER APPLY:

select d.date as tdate, d.datekey, t.ticker, 
       fsdc.ClosePrice as ClosePrice
from dimdates d join
     (select ticker, min(datekey) as min_date
      from factStockDividendCommodity fsdc
      group by ticker
     ) t
     on d.datekey >= t.min_datekey outer apply
     (select top (1) fsdc.*
      from factStockDividendCommodity fsdc
      where fsdc.ticker = t.ticker and
            fsdc.datekey <= d.datekey
      order by fsdc.datekey desc
     ) fsdc
where d.Datekey between 20180101 and 20181231
order by d.Date;

However, because there are probably never more than 3 or 4 days without values in a row, a series of lag()s might be more efficient:

select d.date as tdate, d.datekey, t.ticker, 
       coalesce(fsdc.ClosePrice,
                lag(fsdc.ClosePrice, 1) over (partition by t.ticker order by d.date),
                lag(fsdc.ClosePrice, 2) over (partition by t.ticker order by d.date),
                lag(fsdc.ClosePrice, 3) over (partition by t.ticker order by d.date)
               ) as ClosePrice
from dimdates d join
     (select ticker, min(datekey) as min_date
      from factStockDividendCommodity fsdc
      group by ticker
     ) t
     on d.datekey >= t.min_datekey left join
     factStockDividendCommodity fsdc
     on fsdc.ticker = t.ticker and
        fsdc.datekey = d.datekey
where d.Datekey between 20180101 and 20181231
order by d.Date;