I have this query which runs a join on Books, TradingDesks and ProductInfos. The data is huge in each of this collection.
var queryJoin = from b in books.Values
join d in tradingDesks.Values
on b.TradingDeskId equals d.Id
join p in ProductInfos.Values
**on b.Id equals p.RiskBookId**
select new { p, Book = b.Name, TradingDeskName = d.Name };
In the highlighted line (on b.Id equals p.RiskBookId)
, I also want to add another condition like, (on b.Id equals p.RiskBookId || p.RiskBookId == 0)
. How do I do this in this linq syntax.
I tried querying like this
var queryJoin = from b in books.Values
from d in tradingDesks.Values.Where(x => x.Id == b.TradingDeskId)
from p in cachedProductInfos.Values.Where(y => y.RiskBookId == b.Id)
select new { p, Book = b.Name, TradingDeskName = d.Name };
But in this case, the query runs forever and I run out of memory. So I guess framing it in this fashion does something crazy :(
Any help is appreciated.
Thanks
Mani
In the original query, those calls to Enumerable.Join are using a hashtable behind the scenes to make things fast. If you switch to .Where, you don't get those hash benefits. You can use hashing explicitly to get the same effect.
ILookup<int, string> deskNameLookup = tradingDesks.Values
.ToLookup(
d => d.Id,
d => d.Name
);
ILookup<int, ProductInfo> infoLookup = ProductInfos.Values
.ToLookup(p.RiskBookId);
foreach(b in books.Values)
{
foreach(dName in deskNameLookup[b.TradingDeskId])
{
foreach(p in infoLookup[b.Id].Concat(infoLookup[0]))
{
var x = new {p, Book = b.Name, TradingDeskName = dName};
}
}
}
you could try framing it as a union instead of a single join:
var baseQuery =
from book in books.Values
join desk in tradingDesks.Values on book.TradingDeskId equals desk.Id
select new {book, desk};
var conditionOne =
from baseQ in baseQuery
join productInfo in ProductInfos.Values on baseQ.book.Id equals productInfo.RiskBookId
select new
{
productInfo,
Book = baseQ.book.Name,
TradingDeskName = baseQ.desk.Name
};
var conditionTwo =
from baseQ in baseQuery
join productInfo in ProductInfos.Values on book.Id equals 0
select new
{
productInfo,
Book = baseQ.book.Name,
TradingDeskName = baseQ.desk.Name
};
var result = conditionOne.Union(conditionTwo);
equals
can only be used to do a single comparison.
Someone else may be able to give you an optimised approach to the query for your circumstance, but adding to the equals
condition isn't the solution here. What might be best though is to approach it in a different fashion - rather than yanking at all the data from within the body of a single query - by some sort of bespoke loading and caching mechanism, since the size of the data is obviously problematic.
try this, a little verbatim but should work as you described it:
var queryJoin =
from b in books.Values
join d in tradingDesks.Values on b.TradingDeskId equals d.Id
let p =
from pi in ProductInfos.Values
where (b.Id == pi.RiskBookId) || (pi.RiskBookId == 0)
select pi
where p.Any()
select new
{
p,
Book = b.Name,
TradingDeskName = d.Name
};
Actually :
join d in tradingDesks.Values
on (b.Id equals p.RiskBookId || p.RiskBookId equals 0)
or
on (b.Id == p.RiskBookId || p.RiskBookId == 0)
Should work perfectly fine.
The join requires a true condition:
on <expression>
(b.Id equals p.RiskBookId || p.RiskBookId == 0)
will return true or false, so the join SHOULD evaluate that just so.