ERROR: subquery in FROM cannot refer to other rela

2019-03-16 18:42发布

I'm working with PostgreSQL 9 and I want to find the nearest neighbor inside table RP for all tuples in RQ, comparing the dates (t), but I get this error:

ERROR: subquery in FROM cannot refer to other relations of same query level

using this query:

SELECT *
FROM RQ, (SELECT * FROM RP ORDER BY ABS(RP.t - RQ.t) LIMIT 1) AS RA

RQ.t in subquery seems to be the problem. How can I avoid this error? How can I get access from subquery to RQ?

2条回答
趁早两清
2楼-- · 2019-03-16 18:56

Update:

LATERAL joins allow that and were introduced with Postgres 9.3. Details:


The reason is in the error message. One element of the FROM list cannot refer to another element of the FROM list on the same level. It is not visible for a peer on the same level. You could solve this with a correlated subquery:

SELECT *, (SELECT t FROM rp ORDER BY abs(rp.t - rq.t) LIMIT 1) AS ra
FROM   rq

Obviously, you don't care which row from RP you pick from a set of equally close rows, so I do the same.

However, a subquery expression in the SELECT list can only return one column. If you want more than one or all columns from the table RP, use something like this subquery construct:
I assume the existence of a primary key id in both tables.

SELECT id, t, (ra).*
FROM (
    SELECT *, (SELECT rp FROM rp ORDER BY abs(rp.t - rq.t) LIMIT 1) AS ra
    FROM   rq
    ) x;

Correlated subqueries are infamous for bad performance. This kind of query - while obviously computing what you want - will suck in particular, because the expression rp.t - rq.t cannot use an index. Performance will deteriorate drastically with bigger tables.


This rewritten query should be able to utilize an index on RP.t, which should perform much faster with big tables.

WITH x AS (
    SELECT * 
         ,(SELECT t
           FROM   rp
           WHERE  rp.t <  rq.t
           ORDER  BY rp.t DESC
           LIMIT  1) AS t_pre

         ,(SELECT t
           FROM   rp
           WHERE  rp.t >= rq.t
           ORDER  BY rp.t
           LIMIT  1) AS t_post
    FROM   rq
    )
SELECT id, t
      ,CASE WHEN (t_post - t) < (t - t_pre)
            THEN t_post
            ELSE COALESCE(t_pre, t_post) END AS ra
FROM   x;

Again, if you want the whole row:

WITH x AS (
    SELECT * 
         ,(SELECT rp
           FROM   rp
           WHERE  rp.t <  rq.t
           ORDER  BY rp.t DESC
           LIMIT  1) AS t_pre

         ,(SELECT rp
           FROM   rp
           WHERE  rp.t >= rq.t
           ORDER  BY rp.t
           LIMIT  1) AS t_post
    FROM   rq
    ), y AS (
    SELECT id, t
          ,CASE WHEN ((t_post).t - t) < (t - (t_pre).t)
                THEN t_post
                ELSE COALESCE(t_pre, t_post) END AS ra
    FROM   x
    )
SELECT id AS rq_id, t AS rq_t, (ra).*
FROM   y 
ORDER  BY 2;

Note the use of parentheses with composite types! No paren is redundant here. More about that in the manual here and here.

Tested with PostgreSQL 9.1. Demo on sqlfiddle.

查看更多
走好不送
3楼-- · 2019-03-16 18:57

The correlated subqueries, without an index, are going to do a cross join anyway. So, another way of expressing the query is:

select rp.*, min(abs(rp.t - rq.t))
from rp cross join
     rq
group by <rp.*> -- <== need to replace with all columns

There is another method, which is a bit more complicated. This requires using the cumulative sum.

Here is the idea. Combine all the rp and rq values together. Now, enumerate them by the closest rp value. That is, create a flag for rp and take the cumulative sum. As a result, all the rq values between two rp values have the same rp index.

The closest value to a given rq value has an rp index the same as the rq value or one more. Calculating the the rq_index uses the cumulative sum.

The following query puts this together:

with rqi as (select t.*, sum(isRQ) over (order by t) as rq_index
             from (select rq.t, 0 as isRP, <NULL for each rp column>
                   from rq
                   union all
                   select rq.t, 1 as isRP, rp.* 
                   from rp
                  ) t
            ) t
select rp.*,
       (case when abs(rqprev.t - rp.t) < abs(rqnext.t - rp.t)
             then abs(rqprev.t - rp.t)
             else abs(rqnext.t - rp.t)
        end) as closest_value
from (select *
      from t
      where isRP = 0
     ) rp join
     (select *
      from t
      where isRP = 1
     ) rqprev
     on rp.rp_index = rqprev.rp_index join
     (select *
      from t
      where isRP = 1
     ) rqnext
     on rp.rp_index+1 = rpnext.rq_index

The advantage of this approach is that there is no cross join and no correlated subqueries.

查看更多
登录 后发表回答