Select top x of each id

2019-08-06 10:03发布

I got a bit rusty with SQL.

Lets say I have tblMachineLogs with MachineLogID, MachineID, LogTime (date+time).

This table is filled with logs from 10 machines with MachineID 1 to 10 and have lots of rows in it.

I want to select for example the last 5 log events but of each machine.

Thanks in advance

5条回答
太酷不给撩
2楼-- · 2019-08-06 10:07

Use Window Function which will help you to find last 5 log events in each group(MachineID)

SELECT MachineLogID,
        MachineID,
        LogTime
FROM   (SELECT Row_number()OVER(partition BY MachineID ORDER BY LogTime DESC) Rn,
                MachineLogID,
                MachineID,
                LogTime
        FROM   tblMachineLogs) a
WHERE  rn <= 5 
查看更多
萌系小妹纸
3楼-- · 2019-08-06 10:12

Solution for SQL Server. I tested it on SQL Server 2008.

Imagine that MachineLogs has millions or billions of rows and it has index on (MachineID, LogTime DESC). Solution with ROW_NUMBER would scan the whole table (or just the index, but it will be a full scan). If the index is on (MachineID, LogTime ASC) it would do an extra expensive sort as well.

On the other hand, if we have a tiny table Machines with 10 rows, one for each MachineID, then it is possible to write a query that does 10 seeks on the index instead of scanning the whole big table.

I'll create a big table MachineLogs with 1 million rows and small table Machines with 10 rows and test two solutions.

Table Machines will have 10 rows:

CREATE TABLE [dbo].[Machines](
    [ID] [int] NOT NULL,
CONSTRAINT [PK_Machines] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

INSERT INTO [dbo].[Machines]
([ID])
VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
;

Big table with index on ([MachineID] ASC, [LogTime] DESC):

CREATE TABLE [dbo].[MachineLogs](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [MachineID] [int] NOT NULL,
    [LogTime] [datetime] NOT NULL,
 CONSTRAINT [PK_MachineLogs] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [IX_MachineID_LogTime] ON [dbo].[MachineLogs]
(
    [MachineID] ASC,
    [LogTime] DESC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

ALTER TABLE [dbo].[MachineLogs]  WITH CHECK ADD  CONSTRAINT [FK_MachineLogs_Machines] FOREIGN KEY([MachineID])
REFERENCES [dbo].[Machines] ([ID])
GO

ALTER TABLE [dbo].[MachineLogs] CHECK CONSTRAINT [FK_MachineLogs_Machines]
GO

Generate 1M rows:

WITH
CTE_Times
AS
(
    -- generate 100,000 rows with random datetimes between 2001-01-01 and ~2004-03-01 (100,000,000 seconds)
    SELECT TOP(100000)
        DATEADD(second, 100000000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5), '20010101') AS LogTime
    FROM
        sys.all_objects AS X1
        CROSS JOIN sys.all_objects AS X2
)
-- generate 1M rows
INSERT INTO dbo.MachineLogs
    (MachineID
    ,LogTime)
SELECT
    dbo.Machines.ID
    ,CTE_Times.LogTime
FROM
    dbo.Machines
    CROSS JOIN CTE_Times
;

Solution with ROW_NUMBER

WITH
CTE_rn
AS
(
    SELECT
        ROW_NUMBER() OVER (PARTITION BY MachineID ORDER BY LogTime DESC) AS rn
        ,ID
        ,MachineID
        ,LogTime
    FROM MachineLogs
)
SELECT
    ID
    ,MachineID
    ,LogTime
FROM CTE_rn
WHERE rn <= 5
;

Solution with CROSS APPLY

SELECT
    CA.ID
    ,CA.MachineID
    ,CA.LogTime
FROM
    Machines
    CROSS APPLY
    (
        SELECT TOP(5)
            MachineLogs.ID
            ,MachineLogs.MachineID
            ,MachineLogs.LogTime
        FROM MachineLogs
        WHERE
            MachineLogs.MachineID = Machines.ID
        ORDER BY LogTime DESC
    ) AS CA
;

Execution plans

plans

You can see that solution with ROW_NUMBER does a index scan and solution with CROSS APPLY does index seek.

IO statistics

SET STATISTICS IO ON;

Solution with ROW_NUMBER:

(50 row(s) affected)
Table 'MachineLogs'. Scan count 1, logical reads 2365, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Solution with CROSS APPLY:

(50 row(s) affected)
Table 'MachineLogs'. Scan count 10, logical reads 30, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Machines'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
查看更多
我命由我不由天
4楼-- · 2019-08-06 10:23

To keep it simple, I'd do it with an individual query per machine.

If you are using MySQL:

SELECT MachineLogID, MachineID, LogTime FROM tblMachineLogs WHERE MachineID='str_machineid' ORDER BY LogTime DESC LIMIT 5;

That would return the last 5 event log items from machine with ID indicated by str_machineid. Remove quotes if machine ID is a numeric field (and it should).

查看更多
叼着烟拽天下
5楼-- · 2019-08-06 10:27

Create a query for each machine that selects the top 5 rows, union it all and sort by the log time in a descending order (to get the last 5). Here is an example for two machines, just fill the missing 8.

--drop table #tmp
SELECT  *
into #tmp
FROM    
(
select 1 as MachineLogID, 1 as MachineID , GETDATE() - 0.1 LogTime
    UNION
select 2 as MachineLogID, 1 as MachineID , GETDATE()- 0.2 LogTime
    UNION
select 3 as MachineLogID, 1 as MachineID , GETDATE()- 0.3 LogTime
    UNION
select 4 as MachineLogID, 1 as MachineID , GETDATE()- 0.4 LogTime
    UNION
select 5 as MachineLogID, 1 as MachineID , GETDATE()- 0.5 LogTime
    UNION
select 6 as MachineLogID, 1 as MachineID , GETDATE() - 0.6 LogTime
    UNION
select 7 as MachineLogID, 2 as MachineID , GETDATE()- 0.7 LogTime
    UNION
select 8 as MachineLogID, 2 as MachineID , GETDATE() - 0.8 LogTime
    UNION
select 9 as MachineLogID, 2 as MachineID , GETDATE() - 0.9 LogTime
    UNION
select 10 as MachineLogID, 2 as MachineID , GETDATE() - 0.10 LogTime
    UNION
select 11 as MachineLogID, 2 as MachineID , GETDATE() - 0.11 LogTime
    UNION
select 12 as MachineLogID, 2 as MachineID , GETDATE() - 0.12 LogTime
) a

SELECT  *
FROM    
(
    SELECT  top 5 *
    FROM  #tmp a
    where machineId = 1
    order by LogTime desc
        union
    SELECT  top 5 *
    FROM  #tmp a
    where machineId = 2
    order by LogTime desc
) a
order by a.machineId , a.LogTime desc
查看更多
倾城 Initia
6楼-- · 2019-08-06 10:33
Select top 5 * from yourTable where machineId =1
Union all
Select top 5 * from yourtable where machineid =2
Union all
.
.
.
.
Select top 5 * from yoyrtable 
Where machineid=10
查看更多
登录 后发表回答