I have a list of people's ID and their first name, and a list of people's ID and their surname. Some people don't have a first name and some don't have a surname; I'd like to do a full outer join on the two lists.
So the following lists:
ID FirstName
-- ---------
1 John
2 Sue
ID LastName
-- --------
1 Doe
3 Smith
Should produce:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
3 Smith
I'm new to LINQ (so forgive me if I'm being lame) and have found quite a few solutions for 'LINQ Outer Joins' which all look quite similar, but really seem to be left outer joins.
My attempts so far go something like this:
private void OuterJoinTest()
{
List<FirstName> firstNames = new List<FirstName>();
firstNames.Add(new FirstName { ID = 1, Name = "John" });
firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
List<LastName> lastNames = new List<LastName>();
lastNames.Add(new LastName { ID = 1, Name = "Doe" });
lastNames.Add(new LastName { ID = 3, Name = "Smith" });
var outerJoin = from first in firstNames
join last in lastNames
on first.ID equals last.ID
into temp
from last in temp.DefaultIfEmpty()
select new
{
id = first != null ? first.ID : last.ID,
firstname = first != null ? first.Name : string.Empty,
surname = last != null ? last.Name : string.Empty
};
}
}
public class FirstName
{
public int ID;
public string Name;
}
public class LastName
{
public int ID;
public string Name;
}
But this returns:
ID FirstName LastName
-- --------- --------
1 John Doe
2 Sue
What am I doing wrong?
I'm guessing @sehe's approach is stronger, but until I understand it better, I find myself leap-frogging off of @MichaelSander's extension. I modified it to match the syntax and return type of the built-in Enumerable.Join() method described here. I appended the "distinct" suffix in respect to @cadrell0's comment under @JeffMercado's solution.
In the example, you would use it like this:
In the future, as I learn more, I have a feeling I'll be migrating to @sehe's logic given it's popularity. But even then I'll have to be careful, because I feel it is important to have at least one overload that matches the syntax of the existing ".Join()" method if feasible, for two reasons:
I'm still new with generics, extensions, Func statements, and other features, so feedback is certainly welcome.
EDIT: Didn't take me long to realize there was a problem with my code. I was doing a .Dump() in LINQPad and looking at the return type. It was just IEnumerable, so I tried to match it. But when I actually did a .Where() or .Select() on my extension I got an error: "'System Collections.IEnumerable' does not contain a definition for 'Select' and ...". So in the end I was able to match the input syntax of .Join(), but not the return behavior.
EDIT: Added "TResult" to the return type for the function. Missed that when reading the Microsoft article, and of course it makes sense. With this fix, it now seems the return behavior is in line with my goals after all.
Performs a in-memory streaming enumeration over both inputs and invokes the selector for each row. If there is no correlation at the current iteration, one of the selector arguments will be null.
Example:
Requires an IComparer for the correlation type, uses the Comparer.Default if not provided.
Requires that 'OrderBy' is applied to the input enumerables
I like sehe's answer, but it does not use deferred execution (the input sequences are eagerly enumerated by the calls to ToLookup). So after looking at the .NET sources for LINQ-to-objects, I came up with this:
This implementation has the following important properties:
These properties are important, because they are what someone new to FullOuterJoin but experienced with LINQ will expect.
I decided to add this as a separate answer as I am not positive it is tested enough. This is a re-implementation of the
FullOuterJoin
method using essentially a simplified, customized version ofLINQKit
Invoke
/Expand
forExpression
so that it should work the Entity Framework. There's not much explanation as it is pretty much the same as my previous answer.My clean solution for situation that key is unique in both enumerables:
so
outputs:
I really hate these linq expressions, this is why SQL exists:
Create this as sql view in database and import it as entity.
Of course, (distinct) union of left and right joins will make it too, but it is stupid.