Why are these two Slick queries not equivalent?

2019-06-21 05:33发布

问题:

As a result of trying to make a Slick query more readable, I have this query constructor, which works

val q = Users.filter(_.id === userId) join People on {
  case (u, p) => u.personId === p.id
} joinLeft Addresses on {
  case ((u, p), a) => p.addressId === a.id
} joinLeft Businesses on {
  case (((u, p), a), b) => p.businessId === b.id
} joinLeft Addresses on {
  case ((((u, p), a), b), ba) => b.flatMap(_.addressId) === ba.id
} map {
  case ((((u, p), a), b), ba) => (p, a, b, ba)
}

And this one I thought would be equivalent, but doesn't work:

val q = Users.filter(_.id === userId) join People joinLeft Addresses joinLeft Businesses joinLeft Addresses on {
  case ((((u, p), a), b), ba) =>
    u.personId === p.id &&
    p.addressId === a.flatMap(_.id) &&
    p.businessId === b.flatMap(_.id) &&
    b.flatMap(_.addressId) === ba.id
} map {
  case ((((u, p), a), b), ba) => (p, a, b, ba)
} 

The second one appears to be returning a list of profiles that includes more than the target one.

Why aren't they the same?


The "equivalent" SQL (IE the goal of this construction) is:

select p.*, a1.*, b.*, a2.* from Users u 
innerJoin People p on (u.personId == p.id) 
leftJoin Addresses a1 on (p.addressId == a1.id) 
leftJoin Businesses b on (p.businessId == b.id) 
leftJoin Addresses a2 on ( b.addressId == a2.id)

回答1:

The problem here is, slick uses a LiteralNode(true) as the default join condition. So the second query will results in something looks like the following:

   select p.*, a1.*, b.*, a2.*
     from Users u 
     join People p on 1 = 1
left join Addresses a1 on 1 = 1
left join Businesses b on 1 = 1 
left join Addresses a2 on u.personId = p.id
                      and p.addressId = a1.id
                      and p.businessId = b.id
                      and b.addressId = a2.id

As you can see, all the conditions that are expected to be a join condition for each of the joined tables, are actually a part of the join condition of the last join.

To understand how this will affect the final result, let's simplify the problem as follows:

create temporary table A (id int primary key);

insert into A values (1), (2);

   select a1.id, a2.id, a3.id
     from A a1
     join A a2 on 1 = 1
left join A a3 on a1.id = a2.id
              and a2.id = a3.id;

On the first join, a1 and a2 are joined by a condition that's always true, resulting in a temporary result of:

(1, 1)
(1, 2)
(2, 1)
(2, 2)

Now let's consider the second join. We have a1.id = a2.id as part of join condition, but remember the join condition is used to decide how to retrieve rows from table a3, not to filter the intermediate result of the first join. And we are doing a left join here, so an extra a3 row of NULL is generated, even if the join condition is not satisfied. The final result will be:

(1, 1, 1)
(1, 2, NULL)
(2, 1, NULL)
(2, 2, 2)

So, you're expected to see a lot more unexpected results with columns of the last left joined table being NULL.