Query Records and Group by a block of time

2020-02-09 22:06发布

问题:

I have an application that may be run several times a day. Each run results in data that is written to a table to report on events that occurred. The main report table looks something like this:

Id    SourceId    SourceType    DateCreated
5048  433         FILE          5/17/2011 9:14:12 AM
5049  346         FILE          5/17/2011 9:14:22 AM
5050  444         FILE          5/17/2011 9:14:51 AM
5051  279         FILE          5/17/2011 9:15:02 AM
5052  433         FILE          5/17/2011 12:34:12 AM
5053  346         FILE          5/17/2011 12:34:22 AM
5054  444         FILE          5/17/2011 12:34:51 AM
5055  279         FILE          5/17/2011 12:35:02 AM

I can tell that there were two runs, but I would like a way to be able to query for a date range, the number of times the process was run. I would like to have a query that results in the time the process started and the number of files in the group. This query sort of gets me what I want in terms of I can see what day and hour and how many files were run, but not exactly how I would like. And it would not accomodate runs that ran from 8:58 to 9:04 for example. It also would group runs that started at 9:02 and 9:15 for example.

Select dateadd(day,0,datediff(day,0,DateCreated)) as [Date], datepart(hour, DateCreated) as [Hour], Count(*) [File Count]
From   MyReportTable
Where DateCreated between '5/4/2011' and '5/18/2011'
    and SourceType = 'File'
Group By dateadd(day,0,datediff(day,0,DateCreated)), datepart(hour, DateCreated)
Order By dateadd(day,0,datediff(day,0,DateCreated)), datepart(hour, DateCreated)

I understand that any runs that are close together will likely get grouped together, and I'm fine with that. I only expect to get a rough grouping.

Thanks!

回答1:

If you're certain these runs are contiguous and don't overlap, you should be able to use the Id field to break up your groups. Look for Id fields that are only 1 apart AND datecreated fields that are greater than some threshold apart. From your data, it looks like records within a run are entered within at most a minute of each other, so a safe threshold could be a minute or more.

This would get you your start times

SELECT mrtB.Id, mrtB.DateCreated
FROM MyReportTable AS mrtA
INNER JOIN MyReportTable AS mrtB
    ON (mrtA.Id + 1) = mrtB.Id
WHERE DateDiff(mi, mrtA.DateCreated, mrtB.DateCreated) >= 1

I'll call that DataRunStarts

Now you can use that to get info about where the groups started and ended

SELECT drsA.Id AS StartID, drsA.DateCreated, Min(drsB.Id) AS ExcludedEndId
FROM DataRunStarts AS drsA, DataRunStarts AS drsB
WHERE (((drsB.Id)>[drsA].[id]))
GROUP BY drsA.Id, drsA.DateCreated

I'll call that DataRunGroups. I called that last field "Excluded" because the id it holds is just going to be used to define the end boundary for the set of ids that will be pulled.

Now we can use DataRunGroups and MyReportTable to get the counts

SELECT DataRunGroups.StartID, Count(MyReportTable.Id) AS CountOfRecords
FROM DataRunGroups, MyReportTable
WHERE (((MyReportTable.Id)>=[StartId] And (MyReportTable.Id)<[ExcludedEndId]))
GROUP BY DataRunGroups.StartID;

I'll call that DataRunCounts

Now we can put DataRunGroups and DataRunCounts together to get start times and counts.

SELECT DataRunGroups.DateCreated, DataRunCounts.CountOfRecords
FROM DataRunGroups
INNER JOIN DataRunCounts
    ON DataRunGroups.StartID = DataRunCounts.StartID;

Depending on your setup, you may need to do all of this on one query, but you get the idea. Also, the very first and very last runs wouldn't be included in this, because there'd be no start id to go by for the very first run, and no end id to go by for the very last run. To include those, you would make queries for just those two ranges, and union them together along with the old DataRunGroups query to create a new DataRunGroups. The other queries that use DataRunGroups would work just as described above.



回答2:

Take it a few steps farther:

SELECT
    Count(Id), 
    DATEPART(year, DateCreated) As yr, 
    DATEPART(month, DateCreated) As mth, 
    DATEPART(day, DateCreated) As day, 
    DATEPART(Hour, DateCreated) as hr, 
    DATEPART(minute, DateCreated) as mnt
FROM 
    MyReportTable
WHERE DateCreated between '5/4/2011' and '5/18/2011'
    and SourceType = 'File'
GROUP BY 
    DATEPART(year, DateCreated), 
    DATEPART(month, DateCreated), 
    DATEPART(day, DateCreated), 
    DATEPART(Hour, DateCreated),
    DATEPART(minute, DateCreated)
ORDER BY 
    DATEPART(year, DateCreated),
    DATEPART(month, DateCreated), 
    DATEPART(day, DateCreated), 
    DATEPART(Hour, DateCreated),
    DATEPART(minute, DateCreated)

Edit

To get to a 15 minute resolution, change the last column to

(DATEPART(minute, DateCreated)/15)

(add +1 to that in the select to get 1,2,3,4).