How to Catch ST_MAKEPOLYGON Error in BigQuery

2020-06-28 00:07发布

问题:

I am using ST_MAKEPOLYGON function in BigQuery as follows:

  with data AS (
  SELECT
    61680 AS id, 139.74862575531006 AS lon,
    35.674973127377314 AS lat union all
  SELECT
    61680,
    139.75087881088257,
    35.673909836018375 union all
  SELECT
    61680,
    139.747037887573,
    35.6765767531247 union all
  SELECT
    61680,
    139.75308895111,
    35.6813525780394 union all
  SELECT
    61680,
    139.747509956359,
    35.6798884869144 union all
  SELECT
    61680,
    139.754590988159,
    35.6799930657428 union all
  SELECT
    61680,
    139.754977226257,
    35.6762281415729 union all
  SELECT
    61680,
    139.750170707702,
    35.6815268728124 union all
  SELECT
    61680,
    139.755363464355,
    35.6782500673754
    )
SELECT
  ST_makepolygon(ST_MAKELINE(ARRAY_AGG(st_geogpoint(lon,
          lat)))) AS valid
FROM
  `w_nagakawa.geo_test`
GROUP BY
  id

and I get error like:

Error: ST_MakePolygon failed: Invalid polygon loop: Edge 3 has duplicate vertex with edge 10

Geography argument inside ST_MAKEPOLYGON is okay, and all lat-lon seems to be different.

I'd like to know why it happens and would like to know some ideas to solve this.

Thanks.

回答1:

Frist question first …

I'd like to know why it happens …

Using ST_MAKEPOLYGON with linestring input (via ST_MAKELINE) requires line to be properly assembled such that there is no intersections which happens with your data (as line is built using points in the [random] order of appearance)

Instead, you would need line like below in blue - where all geo-points ordered such that they form non-self-crossed line

Note: The linestring must be closed: that is, the first and last vertex have to be the same. If the first and last vertex differ, the function constructs a final edge from the first vertex to the last.

Building Polygon using the “proper_line” will perfectly work and produce below result

Second question now …

… and would like to know some ideas to solve this

So, obviously, we need somehow properly order geo-points
This can be done manually (have fun with this option) or can be done programmatically
Below is idea of how to do so within BigQuery (Standard SQL) along with details of implementation

So, we want to assign proper sequence number to each point by following below steps:

Step 1 – let’s identify centroid (green pin in below image) for all the points (red pins)

We can use below statement for this:

SELECT ST_CENTROID(ST_UNION_AGG(ST_GEOGPOINT(lon, lat))) centroid FROM `data`

Step 2 - Then, for each point we should calculate angle between centroid-to-point line and horizontal line crossing centroid
We are using anchors (blue circles on the image)

WITH stats AS (
  SELECT ST_CENTROID(ST_UNION_AGG(ST_GEOGPOINT(lon, lat))) centroid FROM `data`
)
SELECT point, centroid, anchor,
  ACOS(ST_DISTANCE(centroid, anchor) / ST_DISTANCE(centroid, point)) angle
FROM (
  SELECT centroid, 
    ST_GEOGPOINT(lon, lat) point, 
    ST_GEOGPOINT(lon, ST_Y(centroid)) anchor
  FROM `data`, stats
)

Step 3 - Now we want to convert those angles into proper sequence reflecting respective points’ quadrants

SELECT point, centroid, anchor,
  CASE 
    WHEN ST_X(point) > ST_X(centroid) AND ST_Y(point) > ST_Y(centroid) THEN 3.14 - angle
    WHEN ST_X(point) > ST_X(centroid) AND ST_Y(point) < ST_Y(centroid) THEN 3.14 + angle
    WHEN ST_X(point) < ST_X(centroid) AND ST_Y(point) < ST_Y(centroid) THEN 6.28 - angle
    ELSE angle
  END sequence
FROM (.. previous subquery here …)

Step 4 - So now, finally we can use sequence column to properly order points Final query below:

WITH `data` AS (
    SELECT 61680 AS id, 139.74862575531006 AS lon, 35.674973127377314 AS lat UNION ALL SELECT 61680, 139.75087881088257, 35.673909836018375 UNION ALL SELECT 61680, 139.747037887573, 35.6765767531247 UNION ALL SELECT 61680, 139.75308895111, 35.6813525780394 UNION ALL SELECT 61680, 139.747509956359, 35.6798884869144 UNION ALL SELECT 61680, 139.754590988159, 35.6799930657428 UNION ALL SELECT 61680, 139.754977226257, 35.6762281415729 UNION ALL SELECT 61680, 139.750170707702, 35.6815268728124 UNION ALL SELECT 61680, 139.755363464355, 35.6782500673754
), stats AS (
  SELECT ST_CENTROID(ST_UNION_AGG(ST_GEOGPOINT(lon, lat))) centroid FROM `data`
) 
SELECT ST_MAKEPOLYGON(ST_MAKELINE(ARRAY_AGG(point ORDER BY sequence))) AS polygon
FROM (
  SELECT point, 
    CASE 
      WHEN ST_X(point) > ST_X(centroid) AND ST_Y(point) > ST_Y(centroid) THEN 3.14 - angle
      WHEN ST_X(point) > ST_X(centroid) AND ST_Y(point) < ST_Y(centroid) THEN 3.14 + angle
      WHEN ST_X(point) < ST_X(centroid) AND ST_Y(point) < ST_Y(centroid) THEN 6.28 - angle
      ELSE angle
    END sequence
  FROM (
    SELECT point, centroid, 
      ACOS(ST_DISTANCE(centroid, anchor) / ST_DISTANCE(centroid, point)) angle
    FROM (
      SELECT centroid, 
        ST_GEOGPOINT(lon, lat) point, 
        ST_GEOGPOINT(lon, ST_Y(centroid)) anchor
      FROM `data`, stats
    )
  )
)

Final result is:

Note: this idea/solution - still can be limited to just some obvious cases like yours - i did not have chance to explore and/or test it for generic cases



回答2:

Your polygon is wrong formatted (probably the points are wrong ordered). I tried to visualize the polygon with the points and the order that you provided, it shows a mesh of edges. Therefore, I ordered the points in the way that make sense to be a polygon. After that, the query worked. The order that I tested is below (they are the same points just ordered in a different way):

 with data AS (
  SELECT
    61680 AS id, 139.75087881088257 AS lon,
    35.673909836018375 AS lat union all
  SELECT
    61680,
    139.74862575531006,
    35.674973127377314 union all
  SELECT
    61680,
    139.747037887573,
    35.6765767531247 union all
  SELECT
    61680,
    139.747509956359,
    35.6798884869144 union all
  SELECT
    61680,
    139.750170707702,
    35.6815268728124 union all
  SELECT
    61680,
    139.75308895111,
    35.6813525780394 union all
  SELECT
    61680,
    139.754590988159,
    35.6799930657428 union all
  SELECT
    61680,
    139.755363464355,
    35.6782500673754 union all
  SELECT
    61680,
    139.754977226257,
    35.6762281415729
    )
SELECT
  ST_makepolygon(ST_MAKELINE(ARRAY_AGG(st_geogpoint(lon,
          lat)))) AS valid
FROM
  data
GROUP BY
  id

This is because BigQuery validates the polygons before load or generate it.