Linq to SQL: Left Join to MAX aggregate

2020-06-28 09:06发布

I am trying to write a Linq to SQL statement which displays all customer records and only the matching max(InvoiceId) of the invoice table; basically the newest invoice for the customer. The left join is required because a customer may not have any invoices but need to be in result set.

Two basic tables with a foreign key of Customer.CustomerID = Invoice.CustomerId

CREATE TABLE [dbo].[Customer](
    [CusomerId] [int] IDENTITY(1,1) NOT NULL,
    [CustomerName] [int] NOT NULL
    CONSTRAINT [PK_Customer] PRIMARY KEY CLUSTERED 
(
    [CustomerId] ASC
)
) ON [PRIMARY]

CREATE TABLE [dbo].[Invoice](
    [InvoiceId] [int] IDENTITY(1,1) NOT NULL,
    [CustomerId] [int] NOT NULL,
    [InvoiceTotal] [float] NOT NULL
    CONSTRAINT [PK_Invoice] PRIMARY KEY CLUSTERED 
(
    [InvoiceId] ASC
)
) ON [PRIMARY]

The SQL of the desired result set is as follows:

SELECT *
FROM Customers c
  LEFT JOIN 
    (Invoice i 
      INNER JOIN (SELECT CustomerId, MAX(InvoiceId) as InvId FROM Invoice GROUP BY CustomerId) as InvList 
      ON i.InvoiceNo = InvList.InvoiceNo) ON c.CustomerId = i.CustomerId

From what I have discovered, I don't think this can be done in a single statement; that the MAX(InvoiceId) product needs to be created first and used in the main statement. Since I can't get it to work, perhaps I am wrong about that too.

标签: linq-to-sql
2条回答
家丑人穷心不美
2楼-- · 2020-06-28 09:11

I wasn't able to get Ben M's example to work, but I was able to work it into the following:

var latestInvoicesPerCustomerQuery = 
    from inv in ctx.Invoice
    group inv by inv.CustomerId into g
    join invj in ctx.Invoice on g.Max(inv => inv.InvoiceId) equals invj.InvoiceId
    select invj;

var customersAndLatestInvoicesQuery = 
    from customer in ctx.Customer
    join linv in latestInvoicesPerCustomer 
         on customer.CustomerId equals linv.CustomerId
         into latestInvoiceJoin
    from latestInvoice in latestInvoiceJoin.DefaultIfEmpty() // left join
    select new
    {
        Customer = customer,
        LatestInvoice = invoice
    };

In the first statement I joined the invoice table back into the result and purposely did not use select new.

//If I had done:
select new {LatestInvoice=invj}
// then I would have included the name LatestInvoice in the second statement:
join linv in latestInvoicesPerCustomer 
             on customer.CustomerId equals linv.LatestInvoice.CustomerId
             into latestInvoiceJoin
// Not desirable to me, and it seems it may be troublesome when used.

This actually simplifies the second statement a bit only having to do a simple left join to the first statements object. I am now getting a result set of all customers and latest invoices for customers who have them.

I am not sure why Ben M's solution does not work, but I only get a left join product when the following line is removed:

join invoice in ctx.Invoice 
         on latestInvoice.InvoiceId equals invoice.InvoiceId

With this line included the product is an inner join.

查看更多
做个烂人
3楼-- · 2020-06-28 09:23

You could write this particular query in LINQ as follows--although this will result in a correlated subquery:

var query = 
    from c in ctx.Customer
    select new
    {
        Customer = c,
        LatestInvoice = ctx.Invoice
            .Where(i => i.CustomerId == c.CustomerId)
            .OrderByDescending(i => i.InvoiceId)
            .FirstOrDefault();
    };

If you want to do it the other way, the LINQ syntax is less readable, but you can split the query up a bit thanks to deferred execution:

var latestInvoicesPerCustomerQuery = 
    from inv in ctx.Invoice
    group inv by inv.CustomerId into g
    select new { CustomerId = g.Key, InvoiceId = g.Max(inv => inv.InvoiceId) };

var customersAndLatestInvoicesQuery = 
    from customer in ctx.Customer
    join linv in latestInvoicesPerCustomer 
         on customer.CustomerId equals linv.CustomerId
         into latestInvoiceJoin
    from latestInvoice in latestInvoiceJoin.DefaultIfEmpty() // left join
    join invoice in ctx.Invoice 
         on latestInvoice.InvoiceId equals invoice.InvoiceId
    select new
    {
        Customer = customer,
        LatestInvoice = invoice
    };

The first query (latestInvoicesPerCustomerQuery) does not execute until you enumerate over it, or over the second query, which references the first. As far as the runtime is concerned, the final query is one expression tree--so you can think of the first query as having been absorbed into the second.

If you really want it all in one query, you can do that too:

var customersAndLatestInvoicesQuery = 
    from customer in ctx.Customer
    join linv in (
            from inv in ctx.Invoice
            group inv by inv.CustomerId into g
            select new 
            { 
                CustomerId = g.Key, 
                InvoiceId = g.Max(inv => inv.InvoiceId) 
            }
        ) 
        on customer.CustomerId equals linv.CustomerId
        into latestInvoiceJoin
    from latestInvoice in latestInvoiceJoin.DefaultIfEmpty() // left join
    join invoice in ctx.Invoice 
         on latestInvoice.InvoiceId equals invoice.InvoiceId
    select new
    {
        Customer = customer,
        LatestInvoice = invoice
    };

Either variant of the customersAndLatestInvoicesQuery should translate roughly into the SQL you list in your post.

查看更多
登录 后发表回答