LINQ Query Help

2019-07-13 02:54发布

问题:

I have this query which runs a join on Books, TradingDesks and ProductInfos. The data is huge in each of this collection.

var queryJoin = from b in books.Values
                                    join d in tradingDesks.Values
                                        on b.TradingDeskId equals d.Id
                                    join p in ProductInfos.Values
                                        **on b.Id equals p.RiskBookId** 
                                    select new { p, Book = b.Name, TradingDeskName = d.Name };

In the highlighted line (on b.Id equals p.RiskBookId), I also want to add another condition like, (on b.Id equals p.RiskBookId || p.RiskBookId == 0) . How do I do this in this linq syntax.

I tried querying like this

var queryJoin = from b in books.Values
                from d in tradingDesks.Values.Where(x => x.Id == b.TradingDeskId)
                from p in cachedProductInfos.Values.Where(y => y.RiskBookId == b.Id)
                select new { p, Book = b.Name, TradingDeskName = d.Name };

But in this case, the query runs forever and I run out of memory. So I guess framing it in this fashion does something crazy :(

Any help is appreciated.

Thanks Mani

回答1:

In the original query, those calls to Enumerable.Join are using a hashtable behind the scenes to make things fast. If you switch to .Where, you don't get those hash benefits. You can use hashing explicitly to get the same effect.

ILookup<int, string> deskNameLookup = tradingDesks.Values
  .ToLookup(
    d => d.Id,
    d => d.Name
  );

ILookup<int, ProductInfo> infoLookup = ProductInfos.Values
  .ToLookup(p.RiskBookId);

foreach(b in books.Values)
{
  foreach(dName in deskNameLookup[b.TradingDeskId])
  {
    foreach(p in infoLookup[b.Id].Concat(infoLookup[0]))
    {
      var x = new {p, Book = b.Name, TradingDeskName = dName};
    }
  }
}


回答2:

you could try framing it as a union instead of a single join:

var baseQuery = 
    from book in books.Values
    join desk in tradingDesks.Values on book.TradingDeskId equals desk.Id
    select new {book, desk};

var conditionOne = 
    from baseQ in baseQuery
    join productInfo in ProductInfos.Values on baseQ.book.Id equals productInfo.RiskBookId
    select new 
    { 
        productInfo, 
        Book = baseQ.book.Name, 
        TradingDeskName = baseQ.desk.Name
    };

var conditionTwo = 
    from baseQ in baseQuery
    join productInfo in ProductInfos.Values on book.Id equals 0
    select new 
    {
        productInfo, 
        Book = baseQ.book.Name, 
        TradingDeskName = baseQ.desk.Name
    };

var result = conditionOne.Union(conditionTwo);


回答3:

equals can only be used to do a single comparison.

Someone else may be able to give you an optimised approach to the query for your circumstance, but adding to the equals condition isn't the solution here. What might be best though is to approach it in a different fashion - rather than yanking at all the data from within the body of a single query - by some sort of bespoke loading and caching mechanism, since the size of the data is obviously problematic.



回答4:

try this, a little verbatim but should work as you described it:

var queryJoin =
    from b in books.Values
    join d in tradingDesks.Values on b.TradingDeskId equals d.Id
    let p =
        from pi in ProductInfos.Values
        where (b.Id == pi.RiskBookId) || (pi.RiskBookId == 0)
        select pi
    where p.Any()
    select new
    {
        p,
        Book = b.Name,
        TradingDeskName = d.Name
    };


回答5:

Actually :

join d in tradingDesks.Values
    on (b.Id equals p.RiskBookId || p.RiskBookId equals 0)

or

on (b.Id == p.RiskBookId || p.RiskBookId == 0)

Should work perfectly fine.

The join requires a true condition:

on <expression>

(b.Id equals p.RiskBookId || p.RiskBookId == 0) 

will return true or false, so the join SHOULD evaluate that just so.