Wherefore are thou Topology? (A Plague On Both Your Houses!)
Not a lot of ESRI posts here recently for a mostly happy reason: there's been little ESRI programming in my universe the past few months. I've been working on a fun little wrapper generator for managed code called GIWS (no, not about the Chosen Tribe, it's SWIG backwards. Get it?) Anyway, that's a post for another time.
Today I come not to praise ESRI Topologies but to bury them. One of our products has a feature where you download a little personal geodatabase from our web application to your desktop to do detailed editing in ArcMap, then you send it back to us and we unpack it. We're thinking of scrapping the whole approach as it turns out to make sense to programmers, but not to end users where it is deployed. But that's yet another post for another time.
Anyway, we take care to create the feature classes in this personal geodatabase in a topology, complete with rules about overlaps, to help users create clean datasets. Several of our feature classes are essentially coverages: the polygons need to be non-overlapping. We thought that by creating a topology automatically we'd be doing our users a favor. They're tedious to put together, and somewhat intricate.
Well, two years later we're getting rid of the topologies. I thought it might be worth sharing our reasoning, as I'm curious whether anyone else has found similar problems. (FWIW, we're using 9.0. Doing work for a big company means you have the awesome upside of knowing your work matters on a large scale. The downside is usually being 12 months behind the technology curve. That's okay.)
- Topology algorithms find uncorrectable problems. We had many instances of topology scans finding, for example, polygon overlaps, which turned out to be degenerate lines, points, or even invisible artifacts. These seemed to be associated with datasets where original work had been done in a projected coordinate system, then projected into the WGS1984 we use internally. I understand that projection might cause points to snap differently, creating errors. We all get it. But the problems detected would turn out to be invisible or uncorrectable. That aggravated people.
- Topology algorithms are different than geoprocessor algorithms. The feature classes users edit were inputs to a series of geoprocessing algorithms. Nothing exciting, mostly intersections followed by some algebra on the attributes. But quite often a topology check would claim no overlaps, while an intersect would show overlaps. (It forced us to do a sanity check before geoprocessing of doing a self-intersect on each layer and asserting that the number of input polygons equalled the number of output polygons.) We did not spend time to figure out who was right -- and given well-known robustness issues in spatial algorithms, it may well be that both are correct. But since our results are created by the geoprocessor, we decided to use it.
- Topological Editing is a very advanced skill—people don't like learning it. We had trouble convincing our users to learn the topological editing tools. Heck, normal editing in ArcMap is hard enough. I couldn't really blame them.
- Topologies add awesome bloat to geodatabases. We were seeing geodatabases with a dozen simple feature classes bloating to 700MB after editing. Compacting them would take them back to 2MB. Ouch. We know databases need to be compacted now and again, but this was a little much for us.
- Toplogies would cause obscure COM errors in our geoprocessor. This one may be sort of our fault. We are using the 9.0 geoprocessor in-process on the main STA thread of our desktop application. It doesn't seem to like that, and we're contemplating running it as an out-of-process python script on demand. Nonetheless, the stability of our tool has increased since we didn't include topologies. Given the above reasons not to use topologies, it wasn't worth debugging this one.
We really wanted topologies to work. They make sense, they reflect how people really think, they ought to be the bee's knees. But we were ultimately disappointed. Perhaps they're better in 9.2 or 9.3, but I doubt we'll try them again. We coded our own overlap detection and repair tool for people who can't use the ones already out there (e.g ET Geowizards).


3 Comments:
Excellent post-- covers two important aspects that caught my attention.
1. The re-defining of common-place terminology by companies like ESRI and Microsoft. The word topology has a well-defined meaning, that has been used for decades in geographic information science. It is commonly used in the context of "topological correctness"-- in that a given feature does not violate a set of "topological" rules (i.e. geometric relationships). Why does ESRI insist on overloading this word to mean "enforcement of geometric / non-geometric rules"? I like the idea, just not the wording conventions.
2. Although it was probably a big selling point for the ArcGIS 9.x products, the concept of "topology" is nothing new. The movement away from the coverage (format) to the simple feature model (shapefiles) resulted in a gap in which the preservation of topological integrity took a back seat. Bad linework and generally mystified responses from people confronted with the notion of topology seem to be the standard these days.
Those condensed rants aside, it seems like there are some serious limitations (bugs?) in the way in which topological correctness is implemented in said product. Bloated geodatabase files, differences in algorithms and COM errors suggest that there are either operator errors involved-- or a wacky disconnect between topology constructs and implementation.
I like the approach used in GRASS-- vectors are always stored in a topologically-aware context. Granted this can introduce some limitations (overlapping polygons...) however, data quality is my primary concern. Things get a little tougher when you are working with external groups only interested in COTs GIS products.
I sympathize with the problems observed here but I don't think they are insurmountable. And more importantly, I don't see a viable alternative for maintaining high quality spatial data. ESRI's geodatabase topology exists for a reason, and it generally does do what it is supposed to do. There definitely are some frustrating nuances with these tools, but what is the alternative?
Two quick points on specific problems: (1) Regarding bloat, this sounds like a function of your personal geodatabases; there is a well-known problem with repetitive transactions in an MS Access file which would not be a problem with topology per se. (2) Regarding Dylan's first comment, creating a new Topology does not require rules. It can be thought of as simply an enabling mechanism on which rules can then be included.
I am curious about GRASS; I haven't heard any discussion about this format in almost 10 years. Are many organizations using it? Given the ever-expanding array of data formats and platforms it is interesting how few of them offer any real solutions for data management. Perhaps this is related to Sebastian's observation that topology is in fact an advanced skill. Everyone wants to believe that "spatial" is easy, and if all one needs is points on a map then it certainly can be easy. But there's an entire science here that gets overlooked and I wonder if Google and VE will ever provide tools that solve the more complicated problems related to data management.
For intricate datasets with detailed relationships to maintain, such as parcels, boundaries, etc. I'm sure topologies are useful in maintaining correctness. (Though, again, with the proviso that "correctness" as determined by the topology scanner may not agree with "correctness" in the geoprocessing and other mundane editing tools.) At the end of the day, we were mostly dealing with coverages: polygons shouldn't overlap and that's about it.
So to Bill's question ("what is the alternative?") I say simply: overlap-detection and correction is adequate and usually much easier to grok for the average GIS tech at our company.
I think part of the issue is that the human (well, animal) brain is so good at visual analysis and at tolerating ambiguity that some of the problems topology scans/overlap detection find are merely aggravating. "It's a sliver! Can't you tell? Why be so anal!" Our new generation of tools simply detects overlaps, corrects them by slicing and clipping polygons in the order that they are already rendered on the screen (hopefully matching what users believe they are seeing anyway), and saves the original in a shape file elsewhere for comparison. That's the 90% case and it seems to be easier for people to understand.
Post a Comment
Links to this post:
Create a Link
<< Home