I have a list of people's ID and their first name, and a list of people's ID and their surname. Some people don't have a first name and some don't have a surname; I'd like to do a full outer join on the two lists.
So the following lists:
ID FirstName
-- ---------
1 John
2 Sue
ID LastName
-- --------
1 Doe
3 Smith
Should produce:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
3 Smith
I'm new to LINQ (so forgive me if I'm being lame) and have found quite a few solutions for 'LINQ Outer Joins' which all look quite similar, but really seem to be left outer joins.
My attempts so far go something like this:
private void OuterJoinTest()
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
var outerJoin = from first in firstNames
join last in lastNames
on first.ID equals last.ID
into temp
from last in temp.DefaultIfEmpty()
select new
{
id = first != null ? first.ID : last.ID,
firstname = first != null ? first.Name : string.Empty,
surname = last != null ? last.Name : string.Empty
};
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
But this returns:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
What am I doing wrong?
As you've found, Linq doesn't have an "outer join" construct. The closest you can get is a left outer join using the query you stated. To this, you can add any elements of the lastname list that aren't represented in the join:
I think there are problems with most of these, including the accepted answer, because they don't work well with Linq over IQueryable either due to doing too many server round trips and too much data returns, or doing too much client execution.
For IEnumerable I don't like Sehe's answer or similar because it has excessive memory use (a simple 10000000 two list test ran Linqpad out of memory on my 32GB machine).
Also, most of the others don't actually implement a proper Full Outer Join because they are using a Union with a Right Join instead of Concat with a Right Anti Semi Join, which not only eliminates the duplicate inner join rows from the result, but any proper duplicates that existed originally in the left or right data.
So here are my extensions that handle all of these issues, generate SQL as good as implementing the join in Linq directly, executing on the server, and is faster and with less memory than others on Enumerables:
The difference between a Right Anti-Semi-Join is mostly moot with Linq to Objects or in the source, but makes a difference on the server (SQL) side in the final answer, removing an unnecessary
JOIN
.The hand coding of
Expression
to handle merging anExpression<Func<>>
into a lambda could be improved with LinqKit, but it would be nice if the language/compiler had added some help for that. TheFullOuterJoinDistinct
andRightOuterJoin
functions are included for completeness, but I did not re-implementFullOuterGroupJoin
yet.I wrote another version of a full outer join for
IEnumerable
for cases where the key is orderable, which is about 50% faster than combining the left outer join with the right anti semi join, at least on small collections. It goes through each collection after sorting just once.I've written this extensions class for an app perhaps 6 years ago, and have been using it ever since in many solutions without issues. Hope it helps.
I don't know if this covers all cases, logically it seems correct. The idea is to take a left outer join and right outer join and combine them together (as it should be).
This works as written since it is in LINQ to Objects. If LINQ to SQL or other, the overload of
DefaultIfEmpty()
that takes in a default may not work. Then you'd have to use the conditional operator to conditionally get the values.i.e.,
Update 1: providing a truly generalized extension method
FullOuterJoin
Update 2: optionally accepting a custom
IEqualityComparer
for the key typeUpdate 3: this implementation has recently become part of
MoreLinq
- Thanks guys!Edit Added
FullOuterGroupJoin
(ideone). I reused theGetOuter<>
implementation, making this a fraction less performant than it could be, but I'm aiming for 'highlevel' code, not bleeding-edge optimized, right now.See it live on http://ideone.com/O36nWc
Prints the output:
You could also supply defaults: http://ideone.com/kG4kqO
Printing:
Explanation of terms used:
Joining is a term borrowed from relational database design:
a
as many times as there are elements inb
with corresponding key (i.e.: nothing ifb
were empty). Database lingo calls thisinner (equi)join
.a
for which no corresponding element exists inb
. (i.e.: even results ifb
were empty). This is usually referred to asleft join
.a
as well asb
if no corresponding element exists in the other. (i.e. even results ifa
were empty)Something not usually seen in RDBMS is a group join[1]:
a
for multiple correspondingb
, it groups the records with corresponding keys. This is often more convenient when you wish to enumerate through 'joined' records, based on a common key.See also GroupJoin which contains some general background explanations as well.
[1] (I believe Oracle and MSSQL have proprietary extensions for this)
Full code
A generalized 'drop-in' Extension class for this
Full outer join for two or more tables: First extract the column that you want to join on.
Then use left outer join between the extracted column and main tables.