I am working on queries on a large table in Postgres 9.3.9. It is a spatial dataset and it is spatially indexed. Say, I have need to find 3 types of objects: A, B and C. The criteria is that B and C are both within certain distance of A, say 500 meters.
My query is like this:
select
school.osm_id as school_osm_id,
school.name as school_name,
school.way as school_way,
restaurant.osm_id as restaurant_osm_id,
restaurant.name as restaurant_name,
restaurant.way as restaurant_way,
bar.osm_id as bar_osm_id,
bar.name as bar_name,
bar.way as bar_way
from (
select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'school') as school,
(select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'restaurant') as restaurant,
(select osm_id, name, amenity, way, way_geo
from planet_osm_point
where amenity = 'bar') as bar
where ST_DWithin(school.way_geo, restaurant.way_geo, 500, false)
and ST_DWithin(school.way_geo, bar.way_geo, 500, false);
This query gives me what I want, but it takes really long time, like 13 seconds to execute. I'm wondering if there is another way to write the query and make it more efficient.
Query plan:
Nested Loop (cost=74.43..28618.65 rows=1 width=177) (actual time=33.513..11235.212 rows=10591 loops=1)
Buffers: shared hit=530967 read=8733
-> Nested Loop (cost=46.52..28586.46 rows=1 width=174) (actual time=31.998..9595.212 rows=4235 loops=1)
Buffers: shared hit=389863 read=8707
-> Bitmap Heap Scan on planet_osm_point (cost=18.61..2897.83 rows=798 width=115) (actual time=7.862..150.607 rows=8811 loops=1)
Recheck Cond: (amenity = 'school'::text)
Buffers: shared hit=859 read=5204
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=5.416..5.416 rows=8811 loops=1)
Index Cond: (amenity = 'school'::text)
Buffers: shared hit=3 read=24
-> Bitmap Heap Scan on planet_osm_point planet_osm_point_1 (cost=27.91..32.18 rows=1 width=115) (actual time=1.064..1.069 rows=0 loops=8811)
Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'restaurant'::text))
Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
Rows Removed by Filter: 0
Buffers: shared hit=389004 read=3503
-> BitmapAnd (cost=27.91..27.91 rows=1 width=0) (actual time=1.058..1.058 rows=0 loops=8811)
Buffers: shared hit=384528 read=2841
-> Bitmap Index Scan on idx_planet_osm_point_waygeo (cost=0.00..9.05 rows=137 width=0) (actual time=0.193..0.193 rows=64 loops=8811)
Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
Buffers: shared hit=146631 read=2841
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=0.843..0.843 rows=6291 loops=8811)
Index Cond: (amenity = 'restaurant'::text)
Buffers: shared hit=237897
-> Bitmap Heap Scan on planet_osm_point planet_osm_point_2 (cost=27.91..32.18 rows=1 width=115) (actual time=0.375..0.383 rows=3 loops=4235)
Recheck Cond: ((way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision)) AND (amenity = 'bar'::text))
Filter: ((planet_osm_point.way_geo && _st_expand(way_geo, 500::double precision)) AND _st_dwithin(planet_osm_point.way_geo, way_geo, 500::double precision, false))
Rows Removed by Filter: 1
Buffers: shared hit=141104 read=26
-> BitmapAnd (cost=27.91..27.91 rows=1 width=0) (actual time=0.368..0.368 rows=0 loops=4235)
Buffers: shared hit=127019
-> Bitmap Index Scan on idx_planet_osm_point_waygeo (cost=0.00..9.05 rows=137 width=0) (actual time=0.252..0.252 rows=363 loops=4235)
Index Cond: (way_geo && _st_expand(planet_osm_point.way_geo, 500::double precision))
Buffers: shared hit=101609
-> Bitmap Index Scan on idx_planet_osm_point_amenity (cost=0.00..18.41 rows=798 width=0) (actual time=0.104..0.104 rows=779 loops=4235)
Index Cond: (amenity = 'bar'::text)
Buffers: shared hit=25410
Total runtime: 11238.605 ms
I'm only using one table at the moment with 1,372,711 rows. It has 73 columns:
Column | Type | Modifiers
--------------------+----------------------+---------------------------
osm_id | bigint |
access | text |
addr:housename | text |
addr:housenumber | text |
addr:interpolation | text |
admin_level | text |
aerialway | text |
aeroway | text |
amenity | text |
area | text |
barrier | text |
bicycle | text |
brand | text |
bridge | text |
boundary | text |
building | text |
capital | text |
construction | text |
covered | text |
culvert | text |
cutting | text |
denomination | text |
disused | text |
ele | text |
embankment | text |
foot | text |
generator:source | text |
harbour | text |
highway | text |
historic | text |
horse | text |
intermittent | text |
junction | text |
landuse | text |
layer | text |
leisure | text |
lock | text |
man_made | text |
military | text |
motorcar | text |
name | text |
natural | text |
office | text |
oneway | text |
operator | text |
place | text |
poi | text |
population | text |
power | text |
power_source | text |
public_transport | text |
railway | text |
ref | text |
religion | text |
route | text |
service | text |
shop | text |
sport | text |
surface | text |
toll | text |
tourism | text |
tower:type | text |
tunnel | text |
water | text |
waterway | text |
wetland | text |
width | text |
wood | text |
z_order | integer |
tags | hstore |
way | geometry(Point,4326) |
way_geo | geography |
gid | integer | not null default nextval('...
Indexes:
"planet_osm_point_pkey1" PRIMARY KEY, btree (gid)
"idx_planet_osm_point_amenity" btree (amenity)
"idx_planet_osm_point_waygeo" gist (way_geo)
"planet_osm_point_index" gist (way)
"planet_osm_point_pkey" btree (osm_id)
There are 8811, 6291, 779 rows in amenity school, restaurant and bar respectively.
The 3 sub-selects that you use are very inefficient. Write them as
LEFT JOIN
clauses and the query should be much more efficient:But this will give too many results if you have multiple restaurants and bars per school. You can simplify the query like this:
This will give every bar and restaurant for each school. Schools without either restaurant or bar within 500m are not listed.
Does it make any difference if you use explicit joins?
This query should go a long way (be much faster):
This is not the same as your original query, but rather what you actually want, as per discussion in comments:
So this query returns a list of those schools, followed by bars and restaurants nearby. Each set of rows is held together by the
osm_id
of the school in the columnschool_id
.Now using
LATERAL
joins, to make use of the spatial GiST index.TABLE school
is just shorthand forSELECT * FROM school
:The expression
(type <> 'school')
orders the school in each set first, because:The subquery
sub
in the finalSELECT
is only needed to order by this expression. AUNION
query limits an attachedORDER BY
list to only columns, no expressions.I focus on the query you presented for the purpose of this answer - ignoring the extended requirement to filter on any of the other 70 text columns. That's really a design flaw. The search criteria should be concentrated in few columns. Or you'll have to index all 70 columns, and multicolumn indexes like I am going to propose are hardly an option. Still possible though ...
Index
In addition to the existing:
If always filtering on the same column, you could create a multicolumn index covering the few columns you are interested in, so index-only scans become possible:
Postgres 9.5
The upcoming Postgres 9.5 introduces major improvements that happen to address your case exactly:
That's of particular interest for you. Now you can have a single multicolumn (covering) GiST index:
And:
And:
Why? Because
ROLLUP
would simplify the query I suggested. Related answer:The first alpha version has been released on July 2, 2015. The expected timeline for the release:
Basics
Of course, be sure not to overlook the basics:
Try this with inner join syntax and compare the results, there should be no duplicates. My guess is it should take 1/3rd the time or better than the original query :