I have the following, which gives me the number of customers within 10,000 meters of any store location:
SELECT COUNT(*) as customer_count FROM customer_table c
WHERE EXISTS(
SELECT 1 FROM locations_table s
WHERE ST_Distance_Sphere(s.the_geom, c.the_geom) < 10000
)
What I need is for this query to return not only the number of customers within 10,000 meters, but also the following. The number of customers within...
- 10,000 meters
- more than 10,000, but less than 50,000
- more than 50,000, but less than 10,0000
- more than 100,000
...of any location.
I'm open to this working a couple of ways. For a given customer, only count them one time (the shortest distance to any store), which would count everyone exactly once. I realize this is probably pretty complex. I'm also open to having people be counted multiple times, which is really the accurate values anyway and think should be much simpler.
Thanks for any direction.
You can do both types of queries relatively easily. But an issue here is that you do not know which customers are associated with which store locations, which seems like an interesting thing to know. If you want that, use the PK and store_name
of the locations_table
in the query. See both options with location id and store_name below. To emphasize the difference between the two options:
- The first option indicates how many customers are in every distance class for every store location, for all customers for every store location.
- The second option indicates how many customers are in every distance class for every store location, for the nearest store location for each customer only.
This is a query of O(n x m)
running order (implemented with the CROSS JOIN
between customer_table
and locations_table
) and likely to become rather slow with increasing numbers of rows in either table.
Count customers in all distance classes
You should make a CROSS JOIN
between the distances of customers from store locations and then group them by the store location id, name and classes of maximum distance that you define. You can create a "table" from your distance classes with the VALUES
command which you can then simply use in any query:
SELECT loc_dist.id, loc_dist.store_name, grps.grp, count(*)
FROM (
SELECT s.id, s.store_name, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist
FROM customer_table c, locations_table s) AS loc_dist
JOIN (
VALUES(1, 10000.), (2, 50000.), (3, 100000.), (4, 1000000.)
) AS grps(grp, dist) ON loc_dist.dist < grps.dist
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;
Count customers in the nearest distance class
If you want customers listed in the nearest distance class only, then you should make the same CROSS JOIN
on customer_table
and locations_table
as in the previous case, but then simply select the lowest group (i.e. the closest store) using a CASE
clause in the query and GROUP BY
store location id, name and distance class as before:
SELECT
id, store_name,
CASE
WHEN dist < 10000. THEN 1
WHEN dist < 50000. THEN 2
WHEN dist < 100000. THEN 3
ELSE 4
END AS grp,
count(*)
FROM (
SELECT s.id, s.store_name, ST_Distance_Sphere(s.the_geom, c.the_geom) AS dist
FROM customer_table c, locations_table s) AS loc_dist
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;