I am trying to parse Well Known Binary a binary encoding of geometry objects used in Geographic Information Systems (GIS). I am using this spec from ESRI (same results here from esri). I have input data from Osmosis a tool to parse OpenStreetMap data, specifically the pgsimp-dump format which gives the hex represenation of the binary.
The ESRI docs say that there should only be 21 bytes for a Point
, 1 byte for byte order, 4 for uint32 for typeid, and 8 for double x and 8 for double y.
An example from osmosis is this (hex) example: 0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40
, which is 25 bytes long.
Shapely a python programme to parse WKB (etc), which is based on the popular C library GEOS is able to parse this string:
>>> import shapely.wkb
>>> shapely.wkb.loads("0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40", hex=True)
<shapely.geometry.point.Point object at 0x7f221f2581d0>
When I ask Shapely to parse from then convert to WKB I get a 21 bytes.
>>> shapely.wkb.loads("0101000020E6100000DB81DF2B5F7822C0DFBB7262B4744A40", hex=True).wkb.encode("hex").upper()
'0101000000DB81DF2B5F7822C0DFBB7262B4744A40'
The difference is the 4 bytes in the middle, which appear 3 bytes into the uint32 for the typeif=d
01010000**20E61000**00DB81DF2B5F7822C0DFBB7262B4744A40
Why can shapely/geos parse this WKB when it's invalid WKB? What do these bytes mean?