I'll use a concrete, but hypothetical, example.
Each Order normally has only one line item:
Orders:
OrderGUID OrderNumber
========= ============
{FFB2...} STL-7442-1
{3EC6...} MPT-9931-8A
LineItems:
LineItemGUID Order ID Quantity Description
============ ======== ======== =================================
{098FBE3...} 1 7 prefabulated amulite
{1609B09...} 2 32 spurving bearing
But occasionally there will be an order with two line items:
LineItemID Order ID Quantity Description
========== ======== ======== =================================
{A58A1...} 6,784,329 5 pentametric fan
{0E9BC...} 6,784,329 5 differential girdlespring
Normally when showing the orders to the user:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN LineItems
ON Orders.OrderID = LineItems.OrderID
I want to show the single item on the order. But with this occasional order containing two (or more) items, the orders would appear be duplicated:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 spurving bearing
KSG-0619-81 5 panametric fan
KSG-0619-81 5 differential girdlespring
What I really want is to have SQL Server just pick one, as it will be good enough:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan
If I get adventurous, I might show the user, an ellipsis to indicate that there's more than one:
OrderNumber Quantity Description
=========== ======== ====================
STL-7442-1 7 prefabulated amulite
MPT-9931-8A 32 differential girdlespring
KSG-0619-81 5 panametric fan, ...
So the question is how to either
- eliminate "duplicate" rows
- only join to one of the rows, to avoid duplication
First attempt
My first naive attempt was to only join to the "TOP 1" line items:
SELECT Orders.OrderNumber, LineItems.Quantity, LineItems.Description
FROM Orders
INNER JOIN (
SELECT TOP 1 LineItems.Quantity, LineItems.Description
FROM LineItems
WHERE LineItems.OrderID = Orders.OrderID) LineItems2
ON 1=1
But that gives the error:
The column or prefix 'Orders' does not match with a table name or alias name used in the query.
Presumably because the inner select doesn't see the outer table.
In
SQL Server 2005
and above, you could just replaceINNER JOIN
withCROSS APPLY
:Please note that
TOP 1
withoutORDER BY
is not deterministic: this query you will get you one line item per order, but it is not defined which one will it be.Multiple invocations of the query can give you different line items for the same order, even if the underlying did not change.
If you want deterministic order, you should add an
ORDER BY
clause to the innermost query.I know this question was answered a while ago, but when dealing with large data sets, nested queries can be costly. Here is a different solution where the nested query will only be ran once, instead of for each row returned.
@Quassnoi answer is good, in some cases (especially if the outer table is big), a more efficient query might be with using windowed functions, like this:
Sometimes you just need to test which query gives better performance.
My favorite way to run this query is with a not exists clause. I believe this is the most efficient way to run this sort of query:
But I have not tested this method against other methods suggested here.
You could do:
This requires an index (or primary key) on
LineItems.LineItemID
and an index onLineItems.OrderID
or it will be slow.