How to return value based on the last available ti

2019-07-11 03:31发布

问题:

I am trying to return data in fifteen minute intervals. The first thing I thought to do was this:

select * from myTable where DATEPART(minute, Timestamp) % 15 = 0

But there are two problems with this approach. The first is that there will not necessarily always be data with a timestamp at a given minute, the other is that sometimes there are multiple data points at a given minute with different second values. I want to have exactly one row for each fifteen minute group, at :00, :15, :30, etc.

This data is only recorded when something changes, so if I don't have a data point at 12:30, for example, I could take the closest data point before that and use that value for 12:30 and it would be correct.

So basically I need to be able to return timestamps at exactly :00, :30, etc along with the data from the record closest to that time.

The data could span years but is more likely to be a shorter amount of time, days or weeks. This is what the expected output would look like:

Timestamp               Value
1/1/2015 12:30:00       25
1/1/2015 12:45:00       41
1/1/2015 1:00:00        45

I'm having trouble thinking of a way to do this in SQL. Is it possible?

回答1:

Given a fixed start time, all you would need is a table of numbers to add your intervals to. If you don't already have a table of numbers (which are useful) then a quick way to generate one on the fly is

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT *
FROM Numbers;

This simply generates a sequence from 1 to 10,000. For more reading on this see the following series:

  • Generate a set or sequence without loops – part 1
  • Generate a set or sequence without loops – part 2
  • Generate a set or sequence without loops – part 3

Then once you have your numbers you can generate your intervals:

DECLARE @StartDateTime SMALLDATETIME = '20150714 14:00',
        @EndDateTime SMALLDATETIME = '20150715 15:00';

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)

SELECT  Interval = DATEADD(MINUTE, 15 * (N - 1), @StartDateTime)
FROM    Numbers
WHERE   DATEADD(MINUTE, 15 * (N - 1), @StartDateTime) <= @EndDateTime

Which gives something like:

Interval
----------------------
2015-07-14 14:00:00
2015-07-14 14:15:00
2015-07-14 14:30:00
2015-07-14 14:45:00
2015-07-14 15:00:00
2015-07-14 15:15:00
2015-07-14 15:30:00

Then you just need to find the closest value on or before each interval using APPLY and TOP:'

/*****************************************************************
SAMPLE DATA
*****************************************************************/

DECLARE @T TABLE ([Timestamp] DATETIME, Value INT);
INSERT @T ([Timestamp], Value)
SELECT  DATEADD(SECOND, RAND(CHECKSUM(NEWID())) * -100000, GETDATE()),
        CEILING(RAND(CHECKSUM(NEWID())) * 100)
FROM sys.all_objects;

/*****************************************************************
QUERY
*****************************************************************/

DECLARE @StartDateTime SMALLDATETIME = '20150714 14:00',
        @EndDateTime SMALLDATETIME = '20150715 15:00';

WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
(   SELECT  Interval = DATEADD(MINUTE, 15 * (N - 1), @StartDateTime)
    FROM    Numbers
    WHERE   DATEADD(MINUTE, 15 * (N - 1), @StartDateTime) <= @EndDateTime
)
SELECT  i.Interval, t.[Timestamp], t.Value
FROM    Intervals AS i
        OUTER APPLY
        (   SELECT  TOP 1 t.[Timestamp], t.Value
            FROM    @T AS t
            WHERE   t.[Timestamp] <= i.Interval
            ORDER BY t.[Timestamp] DESC, t.Value
        ) AS t
ORDER BY i.Interval;

Edit

One point to note is that in the case of having two equal timestamps that are both on or closest to an interval, I have applied a secondary level of ordering by Value:

SELECT  i.Interval, t.[Timestamp], t.Value
FROM    Intervals AS i
        OUTER APPLY
        (   SELECT  TOP 1 t.[Timestamp], t.Value
            FROM    @T AS t
            WHERE   t.[Timestamp] <= i.Interval
            ORDER BY t.[Timestamp] DESC, t.Value --- ORDERING HERE
        ) AS t
ORDER BY i.Interval;

This is arbitrary and could be anything you chose, it would be advisable to ensure that you order by enough items to ensure the results are deterministic, that is to say, if you ran the query on the same data many times the same results would be returned because there is only one row that satisfies the criteria. If you had two rows like this:

    Timestamp    |  Value  | Field1
-----------------+---------+--------
2015-07-14 14:00 |   100   |   1
2015-07-14 14:00 |   100   |   2
2015-07-14 14:00 |   50    |   2

If you just order by timestamp, for the interval 2015-07-14 14:00, you don't know whether you will get a value of 50 or 100, and it could be different between executions depending on statistics and the execution plan. Similarly if you order by Timestamp and Value, then you don't know whether Field1 will be 1 or 2.



回答2:

Like Shnugo mention, you can use a tally table to get your data in an interval of 15 minutes, something like this.

I am creating a dynamic tally table using CTE however you can even use a physical calendar table as per your needs.

DECLARE @StartTime DATETIME = '2015-01-01 00:00:00',@EndTime DATETIME = '2015-01-01 14:00:00'

DECLARE @TimeData TABLE ([Timestamp] datetime, [Value] int);

INSERT INTO @TimeData([Timestamp], [Value])
VALUES ('2015-01-01 12:30:00', 25),
    ('2015-01-01 12:45:00', 41),
    ('2015-01-01 01:00:00', 45);

;WITH CTE(rn) AS
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), CTE2 as 
(
    SELECT C1.rn
    FROM CTE C1 CROSS JOIN CTE C2
), CTE3 as 
(
    SELECT TOP (CEILING(DATEDIFF(minute,@StartTime,@EndTime)/15)) ROW_NUMBER()OVER(ORDER BY C1.rn) - 1 rn 
    FROM CTE2 C1 CROSS JOIN CTE2 C2
)
SELECT DATEADD(minute,rn*15,@StartTime) CurrTime,T.Value
FROM CTE3
CROSS APPLY (SELECT TOP 1 Value FROM @TimeData WHERE [Timestamp] <= DATEADD(minute,rn*15,@StartTime) ORDER BY [Timestamp] DESC) T;

OUTPUT

CurrTime    Value
2015-01-01 01:00:00.000 45
2015-01-01 01:15:00.000 45
.
.
.
2015-01-01 12:00:00.000 45
2015-01-01 12:15:00.000 45
2015-01-01 12:30:00.000 25
2015-01-01 12:45:00.000 41
2015-01-01 13:00:00.000 41
2015-01-01 13:15:00.000 41
2015-01-01 13:30:00.000 41
2015-01-01 13:45:00.000 41


回答3:

Now you really have enough ways to create your tally table :-)

DECLARE @startdate DATETIME={ts'2015-06-01 00:00:00'};
WITH JumpsOf15 AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY object_id) * 15 AS Step
    FROM sys.objects --take any large table here (should have many rows...)
)
SELECT Step,steppedDate.steppedDate
FROM JumpsOf15
CROSS APPLY(SELECT DATEADD(MINUTE,Step,@startdate) AS steppedDate ) AS steppedDate
WHERE GETDATE()>steppedDate.steppedDate;


回答4:

The question is missing original data and schema information, so I'll address the question mainly in general form.

You're looking for results in a range that won't have any missing records, covering data that can have missing records. Given that requirement, the normal solution is to create a projection for just the values you need on the left hand side, using a source like a Numbers table that has nothing to do with your actual data. The Numbers table will be guaranteed not to be missing any records in your range. For date projections, you just add the appropriate number of days or minutes to your starting value, for the number of records you expect in the results.

Once you have the projection, you make an OUTER JOIN from the projection against your actual data. In this case, the JOIN is complicated by the fact that you have some date values extra records. I know of two ways to address this problem. One way is to GROUP BY the values in the projection. The other is to use an OUTER APPLY instead of a join. With an OUTER APPLY, you can just use TOP 1 filter on the applied query to limit results to one item.

In summary, here is some psuedo-code that should help you get to where you need to be:

WITH Numbers AS 
( 
   --select numbers here
),
DateProjection As
(
   SELECT DATEADD(minute, 15*Numbers.Number, '2015-01-01') As RangeStart,
          DATEADD(minute, 15*(Numbers.Number+1), '2015-01-01') AS RangeEnd
   FROM Numbers
)
SELECT dp.RangeStart as TimeStamp, oa.Value
FROM DateProjection dp
OUTER APPLY (SELECT TOP 1 Value FROM [myTable] WHERE myTable.TimeStamp >= dp.RangeStart AND myTable.TimeStamp < dp.RangeEnd) oa


回答5:

Very tricky, but something along these lines may work:

select * from mytable where TimeStamp in (
  select max(TimeStamp) from (
    select date(TimeStamp) dt, hour(TimeStamp) as hr, 
      case when minute(TimeStamp) < 15 then 15 else 
      case when minute(TimeStamp) < 30 then 30 else 
      case when minute(TimeStamp) < 45 then 45 else 60 end end end as mint 
    from mytable where TimeStamp between <some TS> and <some other TS>
  ) t group by dt, hr, mint
)

Of course this will not work if there are two readings with the exact same timestamp, in that case you need yet another group by. Messy querying no matter what.



回答6:

I would use an OVER clause to partition the rows by the timestamp, rounded to the nearest quarter hour. Then order each partition by the difference between the timestamp and the rounded timestamp, ascending, and grab the first row of each partition. I think that would do what you want. This will give you the nearest rows to the 15 minute mark. However, it will not add extrapolated values where there are no rows within a 15 minute period.

SELECT ROW_NUMBER() OVER(PARTITION BY [Timestamp Moded to 15 minutes] ORDER BY [Diff timestamp - timestamp moded to 15 minutes] ASC) AS RowNum, *
FROM MyTable where RowNum = 1


回答7:

You can use next query to grouping data by 15 min intervals:

select *, CASE DATEPART(minute, timestamp) /15 
  WHEN 0 THEN '0-15' WHEN 1 THEN '15-30' WHEN 2 THEN '30-45' WHEN 3 THEN '45-60' END 
AS [Time Group] 
from myTable where
DATEPART(minute, timestamp) /15 = 2 /* for group 30-45 min*/

With account of date and hour:

select *, 
    CAST(CAST(timestamp as date) AS VARCHAR(MAX))+ ' ' + 
    CAST(DATEPART(hour, timestamp) AS  VARCHAR(MAX))  + ':' + 
    CAST(
        CASE DATEPART(minute, timestamp) /15 
            WHEN 0 THEN '0-15' 
            WHEN 1 THEN '15-30' 
            WHEN 2 THEN '30-45' 
            WHEN 3 THEN '45-60' END 
            AS VARCHAR(MAX)) AS [Interval] 
from myTable
order by [Interval]