-->

Why does PostgreSQL combine series in a wrong way?

2019-08-08 00:24发布

问题:

I got some strange behavior of combined generate_series. In 2 different polygons that I tried to fill with a grid, one grid was much rarer:

The query was like this:

SELECT
    osm_id ,
        generate_series(floor(st_xmin(way))::int, ceiling(st_xmax(way))::int, 150) x,
        generate_series(floor(st_ymin(way))::int, ceiling(st_ymax(way))::int, 150) y 
from osm_polygon
order by osm_id, x, y;

I tried tracing the problem, and just entered min/max coordinates. Generate series from min/max values create correct number of values: 9 and 12 rows respectively.

  => select generate_series(9237195, 9238873, 150) x;
      x    
  ---------
   9237195
   9237345
   9237495
   9237645
   9237795
   9237945
   9238095
   9238245
   9238395
   9238545
   9238695
   9238845
  (12 rows)

  => select generate_series(7371701, 7372922, 150) y order by y;
      y    
  ---------
   7371701
   7371851
   7372001
   7372151
   7372301
   7372451
   7372601
   7372751
   7372901
  (9 rows)

Combined, they should make 108 rows, right? No, only 36 rows:

=> select generate_series(9237195, 9238873, 150) x, generate_series(7371701, 7372922, 150) y order by x, y;
      x    |    y    
  ---------+---------
   9237195 | 7371701
   9237195 | 7372151
   9237195 | 7372601
   9237345 | 7371851
   9237345 | 7372301
   9237345 | 7372751
   9237495 | 7372001
   9237495 | 7372451
   9237495 | 7372901
   9237645 | 7371701
   9237645 | 7372151
   9237645 | 7372601
   9237795 | 7371851
   9237795 | 7372301
   9237795 | 7372751
   9237945 | 7372001
   9237945 | 7372451
   9237945 | 7372901
   9238095 | 7371701
   9238095 | 7372151
   9238095 | 7372601
   9238245 | 7371851
   9238245 | 7372301
   9238245 | 7372751
   9238395 | 7372001
   9238395 | 7372451
   9238395 | 7372901
   9238545 | 7371701
   9238545 | 7372151
   9238545 | 7372601
   9238695 | 7371851
   9238695 | 7372301
   9238695 | 7372751
   9238845 | 7372001
   9238845 | 7372451
   9238845 | 7372901
(36 rows)

Brute force actually does work here:

with a as (select generate_series(9237195, 9238873, 150) x),
    b as (select generate_series(7371701, 7372922, 150) y)
select x, y from a, b;

Makes 108 rows. Fine, except queries get more complex.

Why so?

Postgres version is 9.1

回答1:

It's a quirk of how multiple set-returning functions are executed when invoked in the SELECT-list. You expect the result to be the cross product of the two, but that's not how it works. It's actually the lowest common multiple of the rowcounts of the two.

See:

  • Why do these join differently based on size?
  • How to rewrite a SELECT ... CROSS JOIN LATERAL ... statement for older PostgreSQL versions?

Compare:

test=>     SELECT generate_series(1,3) aval, generate_series(1,4) bval;
 aval | bval 
------+------
    1 |    1
    2 |    2
    3 |    3
    1 |    4
    2 |    1
    3 |    2
    1 |    3
    2 |    4
    3 |    1
    1 |    2
    2 |    3
    3 |    4
(12 rows)

test=>     SELECT generate_series(1,3) aval, generate_series(1,3) bval;
 aval | bval 
------+------
    1 |    1
    2 |    2
    3 |    3
(3 rows)

For this reason, on PostgreSQL 9.2 and above you should use LATERAL queries where you invoke set-returning functions in the FROM clause:

test=>     SELECT aval, bval FROM generate_series(1,3) aval CROSS JOIN LATERAL generate_series(1,3) bval;
 aval | bval 
------+------
    1 |    1
    1 |    2
    1 |    3
    2 |    1
    2 |    2
    2 |    3
    3 |    1
    3 |    2
    3 |    3
(9 rows)

test=>     SELECT aval, bval FROM generate_series(1,3) aval CROSS JOIN LATERAL generate_series(1,4) bval;
 aval | bval 
------+------
    1 |    1
    1 |    2
    1 |    3
    1 |    4
    2 |    1
    2 |    2
    2 |    3
    2 |    4
    3 |    1
    3 |    2
    3 |    3
    3 |    4
(12 rows)

On older versions you can use a subquery-in-FROM to avoid having multiple SRFs in a SELECT term:

test=> SELECT generate_series(1,3) aval, bval FROM (SELECT generate_series(1,4)) AS x(bval);
 aval | bval 
------+------
    1 |    1
    2 |    1
    3 |    1
    1 |    2
    2 |    2
    3 |    2
    1 |    3
    2 |    3
    3 |    3
    1 |    4
    2 |    4
    3 |    4
(12 rows)


回答2:

From postgresql documentation

"Note: The key problem with using set-returning functions in the select list, rather than the FROM clause, is that putting more than one set-returning function in the same select list does not behave very sensibly. (What you actually get if you do so is a number of output rows equal to the least common multiple of the numbers of rows produced by each set-returning function.)



回答3:

I use this format:

SELECT i,j FROM
generate_series(1,3) i, generate_series(1,3) j

It works very fine.