SQL Statement Help - Select latest Order for each

2019-01-17 08:42发布

问题:

Say I have 2 tables: Customers and Orders. A Customer can have many Orders.

Now, I need to show any Customers with his latest Order. This means if a Customer has more than one Orders, show only the Order with the latest Entry Time.

This is how far I managed on my own:

SELECT a.*, b.Id
FROM Customer a INNER JOIN Order b ON b.CustomerID = a.Id
ORDER BY b.EntryTime DESC

This of course returns all Customers with one or more Orders, showing the latest Order first for each Customer, which is not what I wanted. My mind was stuck in a rut at this point, so I hope someone can point me in the right direction.

For some reason, I think I need to use the MAX syntax somewhere, but it just escapes me right now.

UPDATE: After going through a few answers here (there's a lot!), I realized I made a mistake: I meant any Customer with his latest record. That means if he does not have an Order, then I do not need to list him.

UPDATE2: Fixed my own SQL statement, which probably caused no end of confusion to others.

回答1:

I don't think you do want to use MAX() as you don't want to group the OrderID. What you need is a ordered sub query with a SELECT TOP 1.

select * 
from Customers inner join Orders 
on Customers.CustomerID = Orders.CustomerID
and OrderID = (SELECT TOP 1 subOrders.OrderID 
                    FROM Orders subOrders 
                    WHERE subOrders.CustomerID = Orders.CustomerID 
                    ORDER BY subOrders.OrderDate DESC)


回答2:

While I see that you've already accepted an answer, I think this one is a bit more intuitive:

select      a.*
           ,b.Id

from       customer a

inner join Order b
on         b.CustomerID = a.Id

where      b.EntryTime = ( select max(EntryTime)
                           from   Order
                           where  Id = b.Id
                         );

I would have to run something like this through an execution plan to see the difference in execution, but where the TOP function is done after-the-fact and that using "order by" can be expensive, I believe that using max(EntryTime) would be the best way to run this.



回答3:

Something like this should do it:

SELECT X.*, Y.LatestOrderId
FROM Customer X
LEFT JOIN (
  SELECT A.Customer, MAX(A.OrderID) LatestOrderId
  FROM Order A
  JOIN (
    SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
  ) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
  GROUP BY Customer
) Y ON X.Customer = Y.Customer

This assumes that two orders for the same customer may have the same EntryTime, which is why MAX(OrderID) is used in subquery Y to ensure that it only occurs once per customer. The LEFT JOIN is used because you stated you wanted to show all customers - if they haven't got any orders, then the LatestOrderId will be NULL.

Hope this helps!

--

UPDATE :-) This shows only customers with orders:

SELECT A.Customer, MAX(A.OrderID) LatestOrderId
FROM Order A
JOIN (
  SELECT Customer, MAX(EntryTime) MaxEntryTime FROM Order GROUP BY Customer
) B ON A.Customer = B.Customer AND A.EntryTime = B.MaxEntryTime
GROUP BY Customer


回答4:

You can use a window function.

SELECT *
  FROM (SELECT a.*, b.*,
               ROW_NUMBER () OVER (PARTITION BY a.ID ORDER BY b.orderdate DESC,
                b.ID DESC) rn
          FROM customer a, ORDER b
         WHERE a.ID = b.custid)
 WHERE rn = 1

For each customer (a.id) it sorts all orders and discards everything but the latest. ORDER BY clause includes both order date and entry id, in case there are multiple orders on the same date.

Generally, window functions are much faster than any look-ups using MAX() on large number of records.



回答5:

SELECT Cust.*, Ord.*
FROM Customers cust INNER JOIN Orders ord ON cust.ID = ord.CustID
WHERE ord.OrderID = 
    (SELECT MAX(OrderID) FROM Orders WHERE Orders.CustID = cust.ID)


回答6:

Something like:

SELECT
  a.*
FROM
  Customer a
    INNER JOIN Order b
      ON a.OrderID = b.Id
        INNER JOIN (SELECT Id, max(EntryTime) as EntryTime FROM Order b GROUP BY Id) met
          ON
            b.EntryTime = met.EntryTime and b.Id = met.Id


回答7:

One approach that I haven't seen above yet:

SELECT
     C.*,
     O1.ID
FROM
     dbo.Customers C
INNER JOIN dbo.Orders O1 ON
     O1.CustomerID = C.ID
LEFT OUTER JOIN dbo.Orders O2 ON
     O2.CustomerID = C.ID AND
     O2.EntryTime > O1.EntryTime
WHERE
     O2.ID IS NULL

This (as well as the other solutions I believe) assumes that no two orders for the same customer can have the exact same entry time. If that's a concern then you would have to make a choice as to what determines which one is the "latest". If that's a concern post a comment and I can expand the query if needed to account for that.

The general approach of the query is to find the order for a customer where there is not another order for the same customer with a later date. It is then the latest order by definition. This approach often gives better performance then the use of derived tables or subqueries.



回答8:

This query is much faster than the accepted answer :

SELECT c.id as customer_id, 
    (SELECT co.id FROM customer_order co WHERE 
    co.customer_id=c.id 
    ORDER BY some_date_column DESC limit 1) as last_order_id
    FROM customer c