Why can shapely/geos parse this 'invalid'

2019-07-20 17:36发布

问题:

I am trying to parse Well Known Binary a binary encoding of geometry objects used in Geographic Information Systems (GIS). I am using this spec from ESRI (same results here from esri). I have input data from Osmosis a tool to parse OpenStreetMap data, specifically the pgsimp-dump format which gives the hex represenation of the binary.

The ESRI docs say that there should only be 21 bytes for a Point, 1 byte for byte order, 4 for uint32 for typeid, and 8 for double x and 8 for double y.

An example from osmosis is this (hex) example: 0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40, which is 25 bytes long.

Shapely a python programme to parse WKB (etc), which is based on the popular C library GEOS is able to parse this string:

>>> import shapely.wkb
>>> shapely.wkb.loads("0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40", hex=True)
<shapely.geometry.point.Point object at 0x7f221f2581d0>

When I ask Shapely to parse from then convert to WKB I get a 21 bytes.

>>> shapely.wkb.loads("0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40", hex=True).wkb.encode("hex").upper()
'0101000000DB81DF2B5F7822C0DFBB7262B4744A40'

The difference is the 4 bytes in the middle, which appear 3 bytes into the uint32 for the typeif=d

01010000**20E61000**00DB81DF2B5F7822C0DFBB7262B4744A40

Why can shapely/geos parse this WKB when it's invalid WKB? What do these bytes mean?

回答1:

GEOS / Shapely use an Extended variant of WKT/WKB called EWKT / EWKB, which is documented by PostGIS. If you have access to PostGIS, you can see what's going on here:

SELECT ST_AsEWKT('0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40'::geometry);

Returns the EWKT SRID=4326;POINT(-9.2351011 52.9117549). So the extra data was the spatial reference identifier, or SRID. Specifically EPSG:4326 for WGS 84.

Shapely does not support SRIDs, however there are a few hacks, e.g.:

from shapely import geos
geos.WKBWriter.defaults['include_srid'] = True

should now make wkb or wkb_hex output the EWKB, which includes the SRID. The default is False, which would output ISO WKB for 2D geometries (but not for 3D).

So it seems your objective is to convert EWKB to ISO WKB, which you can do with GEOS / Shapely for 2D geometries only. If you have 3D (Z or M) or 4D (ZM) geometries, then only PostGIS is able to do this conversion.