我有相关的T-SQL和SQL Server的问题。
比方说,我有一个2列的表订单:
我想第一顺序为每一个产品的日期,所以我进行这种类型的查询:
SELECT ProductId, MIN(Date) AS FirstOrder
FROM Orders
GROUP BY ProductId
我有一个指数ProductId
,包括列CustomerId
和Date
,以加快查询( IX_Orders
)。 查询计划看起来像一个非聚集索引扫描IX_Orders
,随后流合计(不排序感谢指数)。
现在我的问题是,我也想获取CustomerId
与第一顺序为每个产品(产品26首次下令在周二25日,由客户12)。 最棘手的部分是,我不希望在执行计划中的任何内部循环,因为这将意味着每一个额外的读取ProductId
表中,这是非常低效的。
这应该使用相同的非聚集索引扫描,其次是流聚集仅仅是可能的,但是我似乎无法找到一个查询,将做到这一点。 任何的想法?
谢谢
Answer 1:
这将处理有重复日期的产品:
DECLARE @Orders table (ProductId int
,CustomerId int
,Date datetime
)
INSERT INTO @Orders VALUES (1,1,'20090701')
INSERT INTO @Orders VALUES (2,1,'20090703')
INSERT INTO @Orders VALUES (3,1,'20090702')
INSERT INTO @Orders VALUES (1,2,'20090704')
INSERT INTO @Orders VALUES (4,2,'20090701')
INSERT INTO @Orders VALUES (1,3,'20090706')
INSERT INTO @Orders VALUES (2,3,'20090704')
INSERT INTO @Orders VALUES (4,3,'20090702')
INSERT INTO @Orders VALUES (5,5,'20090703') --duplicate dates for product #5
INSERT INTO @Orders VALUES (5,1,'20090703') --duplicate dates for product #5
INSERT INTO @Orders VALUES (5,5,'20090703') --duplicate dates for product #5
;WITH MinOrders AS
(SELECT
o.ProductId, o.CustomerId, o.Date
,row_number() over(partition by o.ProductId order by o.ProductId,o.CustomerId) AS RankValue
FROM @Orders o
INNER JOIN (SELECT
ProductId
,MIN(Date) MinDate
FROM @Orders
GROUP BY ProductId
) dt ON o.ProductId=dt.ProductId AND o.Date=dt.MinDate
)
SELECT
m.ProductId, m.CustomerId, m.Date
FROM MinOrders m
WHERE m.RankValue=1
ORDER BY m.ProductId, m.CustomerId
这将返回相同的结果,只是使用相同的声明和刀片,上面的代码:
;WITH MinOrders AS
(SELECT
o.ProductId, o.CustomerId, o.Date
,row_number() over(partition by o.ProductId order by o.ProductId,o.CustomerId) AS RankValue
FROM @Orders o
)
SELECT
m.ProductId, m.CustomerId, m.Date
FROM MinOrders m
WHERE m.RankValue=1
ORDER BY m.ProductId, m.CustomerId
你可以尝试每个版本,看看它的运行速度更快...
Answer 2:
declare @Orders table (
ProductId int,
CustomerId int,
Date datetime
)
insert into @Orders values (1,1,'20090701')
insert into @Orders values (2,1,'20090703')
insert into @Orders values (3,1,'20090702')
insert into @Orders values (1,2,'20090704')
insert into @Orders values (4,2,'20090701')
insert into @Orders values (1,3,'20090706')
insert into @Orders values (2,3,'20090704')
insert into @Orders values (4,3,'20090702')
insert into @Orders values (5,5,'20090703')
select O.* from @Orders O inner join
(
select ProductId,
MIN(Date) MinDate
from @Orders
group by ProductId
) FO
on FO.ProductId = O.ProductId and FO.MinDate = O.Date
这个估计的查询计划是没用的,因为我与表变量嘲讽,但匿名内部联接应在子查询进行优化。
Answer 3:
在SQL Server 2005+
:
SELECT oo.*
FROM (
SELECT DISTINCT ProductId
FROM Orders
) od
CROSS APPLY
(
SELECT TOP 1 ProductID, Date, CustomerID
FROM Orders oi
WHERE oi.ProductID = od.ProductID
ORDER BY
Date DESC
) oo
名义上,用于查询的计划包含Nested Loops
。
然而,外环将使用Index Scan
与Stream Aggregate
,而内环将包含一个Index Seek
的ProductID
与Top
。
事实上,第二操作几乎是免费的,因为在内部循环使用的索引页面将最有可能驻留在缓存中,因为它刚刚被用于外环。
下面是关于测试结果1,000,000
行(与100
DISTINCT
ProductID
的):
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 100)
Table 'Orders'. Scan count 103, logical reads 6020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 125 ms.
,而这仅仅是一个结果SELECT DISTINCT
查询:
SELECT od.*
FROM (
SELECT DISTINCT ProductId
FROM Orders
) od
而统计:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 100)
Table 'Orders'. Scan count 3, logical reads 5648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 125 ms.
正如我们所看到的,性能是一样的,而CROSS APPLY
只需400
额外的logical reads
(这最有可能永远不会是physical
)。
不要看它是如何可能再提高此查询。
另外这个查询的好处是,它很好地并行化。 您可能会注意到CPU
时间是两次elapsed time
:这是因为在我的旧并行Core Duo
。
一个4-core
CPU
会在那个时间的一半完成此查询。
利用窗口函数不并行解决方案:
SELECT od.*
FROM (
SELECT ProductId, Date, CustomerID, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Date DESC) AS rn
FROM Orders
) od
WHERE rn = 1
,这里是统计:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(строк обработано: 100)
Table 'Orders'. Scan count 1, logical reads 5123, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 406 ms, elapsed time = 415 ms.
Answer 4:
我没有看到很好的这样做而不做一个子查询或一窗口(如ROW_NUMBER,秩)作为最大只查找一列的方式。
然而,你可以做到这一点不是很好。
SELECT
productid,
min(date),
cast(
substring(
min(convert(varchar(23),date,21) + cast(customerid as varchar(20)))
, 24, 44)
as int) customerid
from
orders
group by
productid
如果您的客户ID有不到20位这仅适用。
加入组由子句:EDIT
Answer 5:
SELECT
o1.productid,
o1.date,
o1.customerid
FROM
Orders o1
JOIN
(select productid, min(date) as orderDate
from Orders
group by productid
) firstOrder
ON o1.productid = firstOrder.productid
这是我能想出虽然是诚实是最好的,我不知道这个查询的性能特点是什么。 如果它没有好,我可能会建议运行两个查询,以获得您想要的信息。
Answer 6:
是IX_Orders,按ProductID,然后CutomerId,然后日期抑或是产品ID,然后日期,然后客户ID排序? 如果是前者更改为后者。
换句话说不要使用这样的:
create index IX_Orders on Orders (ProductId, CustomerId, Date)
使用这个来代替:
create index IX_Orders on Orders (ProductId, Date, CustomerId)
然后,如果你这样做:
SELECT o1.*
FROM [Order] o1
JOIN
(
SELECT ProductID, Min(Date) as Date
FROM [Order]
GROUP BY ProductID
) o2
ON o1.ProductID = o2.ProductID AND o1.Date = o2.Date
ORDER BY ProductID
你最终只有一个索引扫描,但是IX_Orders如果两个用户可以在同一时间点同样的产品,你可以得到多行的每一件产品。 你可以过去,这通过使用下面的查询,但它比第一低效率:
WITH cte AS
(
SELECT ProductID, CustomerID, Date,
ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY Date ASC) AS row
FROM [Order]
)
SELECT ProductID, CustomerId, Date
FROM cte
WHERE row = 1
ORDER BY ProductID
文章来源: Find the row associated with a Min/Max, without inner loop