Monday, May 05, 2008

SDE 9.2's ST_GEOMETRY: Part Two, The Empire Strikes Back

I've been investigating ESRI SDE's ST_GEOMETRY support and performance on some simple sample data sets. I'm embarassed to say I don't recall where they came from, but they are all US counties (about 3,000) and all US ZIP codes (about 30,000). My initial reactions were excitement at the speed of spatial joins on small datasets, disappointment at slow performance on large datasets, followed by theorizing about the cost of out-of-process calls.

Well, in these things it's not the journey, it's the destination. I thought I better compare to Oracle Spatial's SDO_GEOMETRY to make sure I was comparing apples to apples, as it were. The results are interesting.

I'm trying two simple tests of throughput, one a spatial join which selects all the ZIP codes which overlap a given (semi-random) set of counties, the other a raw select which forces all geometries to be converted to WKB format for consumption by a putative 3rd party tool.

For Oracle

select count(*) from ozip zi join ocounty co on (sdo_relate(zi.shape, co.shape, 'mask=ANYINTERACT') = 'TRUE') where co.objectid between 1200 and 1500; select sum(dbms_lob.getlength(sdo_util.to_wkbgeometry(shape))) from ozip;

For SDE

select count(*) from zip zi join county co on (st_intersects(zi.shape, co.shape)=1) where co.objectid between 1200 and 1500; select sum(dbms_lob.getlength(sde.st_asbinary(shape))) from zip;

What should the baseline be? I figure it's the existing geoprocessing and rendering tools from ESRI. So for the spatial join I just used the Intersect toolbox on the feature classes with the 'objectid between 1200 and 1500' as the definition query for counties, as above. For raw rendering, I simply averaged 3 successive refreshes of a map full of all 30,000 zip codes. This makes the baseline look even slower than it really is, as it also times rendering and context switches. SDE's fast, people; we knew that.

The somewhat surprising results are as follows (all times in seconds).

OperationSDE/SDOSDE/STST_GEOMETRYSDO_GEOMETRY
Join25256.0?
Scan15535

I got tired of waiting after 15 minutes for Oracle to convert its 30,000 zip codes to WKB, so I gave it a score of ∞. All queries appeared to be CPU bound, which makes sense as the entire data set fits into the memory of even this old laptop.

Update 2008-May-5: In amusing and ironic twist, Paul Ramsey has notified me that Oracle doesn't let you release benchmark results without making sure they've been dolled up and faked by their sales engineers, er, excuse me, cleared by their legal department. I've removed some of the actual figures from the charts above. Infinity's hard to hide in a closet, even with slick marketing.

The lessons are certainly mixed.

Joins. For the kinds of in-SQL-on-the-fly joins that make geometry data types tempting, SDO_GEOMETRY might be a clear winner (better ask Oracle legal) but not by a massive amount (?? seconds versus 6.0 seconds, making ST_GEOMETRY ??% slower). Perhaps this is because of the process switches endured by implementing ST_GEOMETRY in st_shapelib, perhaps not. Perhaps it is the difference in indexing schemes. More complex and varied tests would need to be done. For back-of-the-envelope estimates, they're roughly equivalent. For most applications, if 10 seconds is acceptable, so is ??. ST_GEOMETRY and SDO_GEOMETRY are both certainly far smarter than the Intersect toolbox, which spends most of its time querying more data than it needs.

Scans. If you want to grab data from these systems and process them using open standards, you lose either way. With ST_GEOMETRY, you can do the work using ST_ASBINARY, but the performance is unimpressive: roughly 1000 shapes per second. (Compare at 6000 per second for SDE querying ST_GEOMETRY itself and ArcMap rendering them.) SDO_GEOMETRY goes completely to lunch; they're not taking WKB very seriously. My 3:1 showing for SDE rendering of ST_GEOMETRY versus SDO_GEOMETRY layers throws a lot of these numbers into question. Would ESRI really let their Oracle Spatial implementation be that much slower?

My provisional conclusion is that ST_GEOMETRY holds promise for spatial SQL, as much as SDO_GEOMETRY, though it probably needs more tuning from ESRI's side. No one is going to be writing useful GIS tools which use the WKB/WKT forms of these geometries anytime soon. If you want fast scanning of data, you've got to get under the covers and read the data natively. That is, I suppose, the next experiment. I'd like to know how easy it will be to read ST_GEOMETRY data natively to .NET, where my particular bread is buttered. More to come.

5 Comments:

At May 05, 2008 7:15 PM, Blogger Paul Ramsey said...

Great series of posts! I'm afraid one of the reasons they are so interesting is you are contravening the section of the Oracle EULA that enjoins you to not "disclose results of any program benchmark tests without our prior consent".

That's the OTN license. I think if you have paid for a license you are still forbidden, but I'm not sure.

 
At May 05, 2008 9:06 PM, Blogger Zac Spitzer / Fastmovingtarget said...

Interesting stats, I have noticed the WKB WKT support in oracle is rather slow as well.

I would be interested to know which engine is more accurate in complex spatial operations

 
At May 07, 2008 9:47 PM, Anonymous Cellulose said...

You've got an additional complication that Oracle Spatial can handle geodesic spatial references where as ST_GEOMETRY is limited to planar... What SRID were you using for this test?

Personally, I've found the performance difference between planar and geodesic in Oracle to be very small, but some I've talked said its significant. I guess YMMV.

 
At June 06, 2008 10:52 AM, Anonymous Anonymous said...

Is there a good book or reference on line that talks about ST_GEOMETRY?

 
At June 06, 2008 10:58 AM, Blogger Sebastian Good said...

The documentation ESRI provides isn't half bad:

http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An_overview_of_working_with_a_geodatabase_using_SQL

 

Post a Comment

Links to this post:

Create a Link

<< Home