Tuesday, June 26, 2007

The adventures of WKB across the third dimension

The adventures of WKB in the third dimension end rather abruptly rather near the second dimension. While PostGIS "extends" the WKB spec to allow 3D and linear-referencing (M values), it appears ESRI has hewn more closely to the spec and constricts WKB to the original 2D max. (To be fair PostGIS defaults to trafficking only in standard WKB.) Blame lies squarely at the feet of OGC. Why write a WKB standard that doesn't understand Z and M measures when the WKT standard and SHP standard have such a clear understanding of them?

Who cares? Who actually communicates via WKB anyway? Not many people as far as I can tell. But in building a system that communicates geometries directly with the RDBMS (without preposterous middleware like SDE or ArcObjects), a binary format like WKB seems far more efficient than the more-expressive WKT. I suppose parsing a bunch of doubles is still a lot faster than connecting to a database in the first place, it does expand the amount of traffic on the wire dramatically, which costs more and more these days.

Oh well. I guess I'll be better friends with ST_POLYGON than with ST_PolyFromWKB... at least where our data is 3-dimensional.

6 Comments:

At June 26, 2007 8:53 PM, Anonymous said...

Preposterous SDE or not, there's actually a good reason to use SDE (and the format is well documented) or something similar.
There is a big overhead with transferring binary streams between client and database server. SDE uses a compression at about 40% compared to WKB.
I didn't think this would really matter when the database was at localhost, and it had the added overhead of uncompressing the geometry, but I was amazed at the speed difference. It really do matter (but of course WKT is even worse).

 
At June 26, 2007 10:00 PM, Sebastian Good said...

Indeed. Perhaps the right answer for high performance is to write one's own Oracle stored procedure which takes the compressed stream and constructs the correct objects server-side. With options for coding in C, Java and .NET languages, it should be somewhat straightforward.

 
At June 26, 2007 10:24 PM, Anonymous said...

It wont help. The whole point is to minimize the amount of data that goes from the server to the client. As far as I know, any user-type will have the overhead of transfering blobs.

 
At June 26, 2007 10:37 PM, Sebastian Good said...

You could write the compression yourself both on the way in and out if you wanted to deal with it yourself. Granted, this is inserting your own (preposterous?) middleware, but it's cheap. The discussions in the sessions at the 2007 UC suggested that the ST_GEOMETRY wasn't significantly greater in total storage -- stats I saw were on the order of 15%.

 
At June 26, 2007 10:51 PM, Paul Ramsey said...

The PostGIS extension to WKB to accomodate 3D was actually pre-dated by an addendum to the SF-COM specification brought forward by CadCorp that defined the solution PostGIS used. PostGIS, Cadcorp and OGR all ended up using the same solution, that set the high bit to indicate 3D geometries.

The punchline is that SFSQL2 and SQL/MM completely ignored the agreed adendum and simply tacked on a few thousand to indicate Z, M and ZM, respectively.

Frankly, I think size-on-the-wire is a bit of a red herring. What you really want is to push a low-intensity format across, like WKB, that is easy to parse into the object of your choise, rather than something that requires a lot of smarts to parse, like well-known text.

Before you go off and give up on WKB, let me share this story. Back when PostGIS support was first added to Mapserver, we made the decision to wrap the call to the geometry column in AsBinary(), so the transit would be a standard format. As a result, when we later changed the canonical form of the PostGIS geometries, the Mapserver connectivity DID NOT BREAK, even though much of the internal workings of PostGIS had been completely changed, including the canonical form.

 
At June 26, 2007 11:05 PM, Sebastian Good said...

Paul, I agree. I'm sure that if I were programming against PostGIS or a custom data store, I'd be happy with a 3D WKB. But soldiering away here at "BigCo" we have to innovate within the requirement that everythng still be "in SDE". (The notion that SDE is a middleware layer on top of a perfectly respectable database is somehow lost on most. Which is a shame, buecause SDE is the best of ESRI's software stack, but it seems to imply using the worst of the stack! But I digress) I think within our applications we are indeed planning on building to "TO_GEOMETRY" and "FROM_GEOMETRY" into our raw data access layer. This way the transport can be WKB or WKT as appropriate, but within our code we'll be talking pure IGeometry. Since serializers for both are awfully easy to write (the WKB weighing in at about 50 lines of C#), I suspect it won't be too bad. If lots of 3D data becomes wire-heavy, we can reconsider, but as you say, it may be a red herring. Only measurement will tell us. I'll document our adventures.

 

Post a Comment

Links to this post:

Create a Link

<< Home