Saturday, April 28, 2007

F#, Mono, and Mac OS X: Warmups

I've been meaning to try F# in earnest for some time just to keep up my programming chops. It's been so long since I programmed in a functional language, and that was Scheme 10 years ago. F# (like its uncle language Caml) is statically typed with crazy type inference, so it presents some new challenges. While I'm waiting for my copy of The Little MLer to arrive, I though I'd get F# up and running.

Since the objective is to build unused brain muscles, I figured it's no good just running F# in Visual Studio. While .NET interoperability is the reason to use F# over ML, I spend all day in Visual Studio. Well, I have a Mac and keep muttering about Mono... so it was settled. I would run F# in Mono on a Mac.

fsi.exe is the interactive interpreter I'm using for that most basic of functional "Hello World" examples:

let rec fac = function | 0 -> 1 | n -> n * fac(n-1);;

Install Mono and F#

The installation script and instructions for the 1.9 version of F# are clear and easy, and I had F# running on Mono/Mac quite quickly. (Ignore the thousands of compiler warnings. Apparently the ahead-of-time Mono compiler on the Mac is grouchy.) I installed F# to my /Applications folder. I'm new to the Mac, so perhaps this is heresy, but it seemed obvious. Mono seems to require an X Server to be running for its Windows.Forms implementation, and for some reason fsi.exe wants to use it. I was greeted immediately with

LittleMac:~ sebastian$ mono /Applications/FSharp/bin/fsi.exe MSR F# Interactive, (c) Microsoft Corporation, All Rights Reserved F# Version 1.9.1.8, compiling for .NET Framework Version v1.1.4322 > Unhandled Exception: System.TypeInitializationException: An exception was thrown by the type initializer for System.Windows.Forms.XplatUI ---> System.ArgumentNullException: Could not open display (X-Server required. Check you DISPLAY environment variable) Parameter name: Display at System.Windows.Forms.XplatUIX11.SetDisplay (IntPtr display_handle) [0x00000] at System.Windows.Forms.XplatUIX11..ctor () [0x00000] at System.Windows.Forms.XplatUIX11.GetInstance () [0x00000] at System.Windows.Forms.XplatUI..cctor () [0x00000] --- End of inner exception stack trace --- at <0x00000> at System.Windows.Forms.Form.get_CreateParams () [0x00000] at System.Windows.Forms.Control.SizeFromClientSize (Size clientSize) [0x00000] at System.Windows.Forms.Control..ctor () [0x00000] at System.Windows.Forms.ScrollableControl..ctor () [0x00000] at System.Windows.Forms.ContainerControl..ctor () [0x00000] at System.Windows.Forms.Form..ctor () [0x00000] at Microsoft.FSharp.Compiler.Interactive.Shell+DummyForm..ctor () [0x00000] at (wrapper remoting-invoke-with-check) DummyForm:.ctor () at Microsoft.FSharp.Compiler.Interactive.Shell.main$cont@1318@1318 () [0x00000] at .Microsoft.FSharp.Compiler.Interactive.Shell._main () [0x00000] ** ERROR **: file mini.c: line 8704 (mono_get_lmf_addr): should not be reached aborting... Stacktrace: ** (process:1474): ERROR (recursed) **: file mini.c: line 8688 (mono_get_lmf): should not be reached aborting... Abort trap

Easy enough, just run fsi from an XTerm or write a quick script to set your $DISPLAY variable. I whipped up the following (talk about stretching unused muscles...) and put a symlink to it in my /usr/local/bin so the rest of the world would think "fsi" was really a well-behaved F# interpreter.

#!/bin/sh export DISPLAY=:0.0 mono /Applications/FSharp/bin/fsi.exe "$@"

(And the same for fsc.exe.) Now fsi fires up relatively well, issuing what appears to be harmless error on startup, but executing nicely thereafter.

LittleMac:~ sebastian$ fsi MSR F# Interactive, (c) Microsoft Corporation, All Rights Reserved F# Version 1.9.1.8, compiling for .NET Framework Version v1.1.4322 > Gtk not found (missing LD_LIBRARY_PATH to libgtk-x11-2.0.so.0?), using built-in colorscheme

Write some F#

Now I was ready to play. But while I was willing to leave Visual Studio, I wasn't willing to leave syntax-coloring, interactive buffers, and a text editor. So then began a quest to find a text editor on the Mac with a nice interactive buffer capability. I wanted to edit and type statements in one window, and see the results of their execution below, like I can with TOAD, DrScheme, and Emacs. I was originally drawn to TextMate, a favorite among the Ruby-slinging cool kids. While it is a snappy little editor with widespread language support, it doesn't yet appear to have the facility to execute interactive sessions like this. Yes, you can select expressions to send to scripts, but not to a co-executing interactive session. So off I want to that old standby, Emacs. The programming community in France has written a "tuareg-mode elisp package for Emacs that does all of this and more for Caml, a language very similar to F#. Work has been done to customize tuareg-mode for F#, and I followed in its footsteps. It worked!

Those unused muscles sure are sore now. Now can I get around to programming?

Friday, April 27, 2007

The Missing AGX WMS GetFeatureInfo Support: If I Had to Guess

ArcGIS Explorer doesn't support the identify tool on WMS layers. ArcMap does. Why?

If I had to guess, it's because the WMS GetFeatureInfo command doesn't take arguments in the maps original coordinate system. It is geared for pixel hit tests. The arguments of a GetFeatureInfo command of interest are BBOX, a bounding box, HEIGHT & WIDTH, and X & Y. X & Y are pixel locations in a map of size HEIGHTxWIDTH bounded by BBOX. That's well and good if you throw that map up on a screen, as originally returned, and click on a pixel with your mouse. Trivial to implement.

But even in ArcMap, according to ESRI if you do anything tricky with that WMS image, like rotate the view frame, the WMS query will fail.

Identify results for WMS layers may be incorrect if the data frame is rotated... This is a known limitation

Well if rotation is tricky enough, imagine draping an image over a globe and doing a simple pixel hit test. I'm guessing they just haven't got around to the translation from mouse X/Y coordinates to map coordinates back to fake mouse coordinates in the original WMS context.

If I had to guess. For now I guess I can "fake" the request. How do people handle this in general?

Thursday, April 26, 2007

As the World Turns ... out Bugs

The Venerable Info Tool

While ArcGIS Explorer's WMS support is poorly implemented, it does succeed in tiling the earth cleanly with images from WMS servers.

WMS servers can optionally support a GetFeatureInfo request which is a very light-weight feature querying tool. It's no WFS — typically servers return simple HTML or at best, some GML. But it's a nice hook for displaying something to users who ask "Hey, what's right here?". Alas, ArcGIS Explorer appears to allow its "information task" to query WMS layers, but it doesn't work. It claims to find no features, but some HTTP spying reveals it never even asks. It's easy to fake with your own tool, but since ArcMap supports info queries on WMS layers, the lack of AGX support is puzzling. I wish they'd support the GetFeatureInfo in AGX — it's a quick and easy hook into user-defined content supplied as web pages!

It's as if ArcGIS Engine (which AGX is built with) wasn't a carved-out subset of ArcMap made for building with, but instead a strangely buggy alternate implementation of all that's in ArcMap. Say it ain't so!

A Warped Picture

While I've been pretty tough on AGX's efficiency, it does at least seem to avoid some basic correctness issues that still plague Google Earth. Check out what happens when a pretty generic NOAA WMS service is shown on Google Earth!

Google Earth is asking for a 512x512 image, and IMS is returning one, but clearly there is disagreement about just what the correct extent should be. This problem has existed for a while and there are various tricks out there to fix the problem, though it generally seems to work better if you're a "KML reflector" rather than a WMS impersonator.

Now it's interesting to note that particular WMS service is ESRI-powered (using their IMS/WMS connector), and another Canadian Atlas WMS service overlays just fine... and it's powered by MapServer. But one must never attribute to malice aforethought that which can be adequately explained by incompetence or inattentiveness!

And just why Google Earth decided to do a single image overlay as opposed to WMS tiles, I still don't know. I suppose lots of WMS services are more like IMS in their flavor -- lots of labelling that would look bizarre if repeated in tiles.

Tuesday, April 24, 2007

The Tyranny of the Round Trip: Work in Sets

So everyone is wondering, as I've been, about RESTful GIS services and deciding that in the end, it's all about letting those precious polygons have their own URLs. Very cool, and I like what I'm seeing in the FeatureServer prototype.

The real trick is touched on here, but I'd love to see some elaboration on the issue. The classic REST discussions talk about lots and lots of people (i.e. the Internet population) dealing in parallel with a handful atomic items (e.g. blog entries, eBay auctions.) But while Internet GIS services might still serve lots and lots of people, they'll also serve thousands or millions of objects. As Paul Ramsey[*] points out it's not fair to hand a WFS client a list of URLs to ten thousand polygons, each of which needs to be retrieved one at a time (open connections and parallel retrieval via HTTP 1.1 notwithstanding).

I think we need a way to define entire sets of resources to work on. Paul suggests tiles, which is eminently sensible. (What are tiles but pieces of a grid-based spatial index, something at the heart of GIS implementations from day one?) FeatureServer and WMS both suggest a query string-based query language (e.g. BBOX=x,y,x,y etc.). But don't we have a few query languages which are already well understood? I'm not sure what it means, but shouldn't we be talking about SQL or XPath in this context?

[*] A cut-paste error in a previous version of this article said Chris Holmes suggested tiles. As a commenter noted, this was actually Paul Ramsey's suggestion. Chris suggests paged chunks — essentially forward-read (paralellizable) cursors.

Monday, April 23, 2007

Help Help Needs Needs Explorer Explorer ArcGIS ArcGIS: HTTP Adventures

It was with great enthusiasm that I fired up ArcGIS explorer a few weeks ago to kick the tires. The spinning-globe phenom shows no signs of abating, and since ESRI is the tool of choice at big companies, people like me have to know how to put data on ESRI's spinning globe. In this regard I am hardly unique. However I've never seen an analysis of ArcGIS Explorer's network behavior. It's interesting.

For various reasons, we want to expose some information to ArcGIS explorer as WMS services. (Perhaps I'll expand on these in future.) So in order to see what was going on (and if we could get in the way to do anything clever), we immediately fired up the tool that should be at the top of every web developer's list: Fiddler. Fiddler acts as an HTTP proxy, allowing you to spy on every action an HTTP client takes and spy on every response its servers send back. The results are fascinating.

The first trick is always to get Fiddler to step in the middle of this HTTP traffic. Strangely, ArcGIS explorer doesn't use WinINET proxy settings (the ones you set from IE's options panel and that most Windows apps share.) Since ArcGIS Explorer runs on Windows, I'm not clear why the developers skipped this. It's a vague annoyance, but at least the proxy settings can be easily set.

Next we watch the traffic. I pointed at one of the pre-canned WMS services ArcGIS Explorer ships with (map.ngdc.noaa.gov) and watched the data come in.

The Tiles Roll In

As expected, ArcGIS explorer asks for maps a tile at a time, and the tile sizes are nicely predictable, generally chopping the world in half as it goes. Here is a typical (formatted) GET tile request.

GET /servlet/com.esri.wms.Esrimap ?VERSION=1.1.0 &REQUEST=GetMap &SRS=EPSG:4326 &BBOX=-90,-45,.000000000000000,45 &WIDTH=512&HEIGHT=512 &LAYERS=WorldGrid,Countries,Cities &STYLES= &EXCEPTIONS=application/vnd.ogc.se_xml &FORMAT=image/png &BGCOLOR=0xFFFFFF &TRANSPARENT=TRUE HTTP/1.1

As said, the good news is that the tiles are regularly sized (512 pixels) and on predictable boundaries (even fractions of 90 degrees). They're being requested as PNGs with transparency, and perhaps most importantly they're being retrieved as GET requests, so they are cacheable. And now for the not-so-good news, which somewhat overwhelms the good news.

?Backwards Requested Tiles The Are Why

Well, if you actually look at the tiles in the order they are requested, you invariably notice that Explorer asks for them in the wrong order. Here is what my final globe looked like:

And here is the order I got the tiles in.


This is close to what I want.


Fair enough, it's most of my field of vision, too.


Wow. Where did this come from? It's not even visible on this globe and the side I'm looking at isn't finished!


Dude, Alaska is barely visible on this globe and I haven't seen Canada yet!


Okay, there's Canada. Where's Mexico and the Western US?


Greenland - good job.


More Greenland.


Uh oh. The northern coastline of Russia. Not visible on the globe, and we're getting down to smaller detailed tiles, when we still haven't received the full tile set at the larger level. I still have blank spots on my globe.


Europe, good. At least we're not in Russia anymore.


More Russia. Can't see it.


And more detailed Russia. Where the heck are Mexico and California? We're still waiting.


Awesome. Antarctica — not visible.


More Antarctica. Perhaps ESRI developers liked Happy Feet?

... and we skip another 6 tiles with beautiful detail of the coastlines of Chile and Argentina (both invisible on this view) ... until we finally get to the very last tile obtained:


Mexico and California!

Oh, Actually the Tiles Are Different Sizes

If you look at the tiles above carefully, you'll notice they're not all actually the same size. Some are 512x512. Others are 255x256. Another one is 256x257. Why? Rounding errors? Don't know. This isn't really a particularly big issue unless you're trying to pre-generate WMS tiles for clients like ArcGIS explorer... and that turns out to be a bit of a problem since ArcGIS has funny caching behavior.

Could You Say That Again?

It is a real shame that the tiles are retrieved in what appears to be a silly order. But at least they are retrieved as GET requests and they're eminently cacheable by the scalable Internet infrastructure. Surely NOAA isn't actually having to generate real pixels for these tiles most of the time. Wrong. The GET requests indicate no willingness to cache results — ArcGIS Explorer never lets HTTP proxies cache data. This means that the hopes of letting the HTTP proxies of the world unite to lighten your server load, you instead need to do all your own caching or pregeneration.

GET HTTP/1.1 User-Agent: ArcMap Service Layers Host: map.ngdc.noaa.gov Proxy-Connection: Keep-Alive

We know that ArcGIS Explorer caches copies of this data on disk somewhere. Perhaps their strategy was to not clutter people's disks with two copies of the data — one in their Internet cache (built into Windows Vista, for instance), and another in their ArcGIS Explorer cache. This would be the positive reading. But another disturbing fact suggests an alternate reading.

Let's Not Cache It and Ask For It Twice

Yes, ArcGIS Explorer asks for every tile twice. Here is a section of Fiddler's recording of the above session. The left column is the request number, i.e. the order in which ArcGIS Explorer requested tiles. The right column is the size of the response. You can see each request was repeated twice, getting the same result each time (of course).

Multithreading? Almost.

This one is a little harder to demonstrate with screenshots, but observations prove a couple of things conclusively.

  • One HTTP connection per server. Despite the HTTP spec allowing each client to make two simultaneous connections to each HTTP server, Explorer doesn't bother. It's a shame, since it'd usually get the tiles to my spinny globe twice as fast. Everyone else does this. Why not? Because...
  • One HTTP retrieval thread period!. Even if you connect to two services at once (WMS, IMS, Globe Service, whatever), Explorer will only hold one connection open at a time. So your fastest services are held hostage to your slowest ones. I connected to a WMS which took about 30 seconds to generate an image. This meant that upon zooming to a reasonable level of detail instead of getting a map within a minute or two, it took upwards of 10 minutes because it got tiles in a bad order, got them more than once, and all the while I couldn't see any other context from the other layers because they were held up behind the slowpoke. This is a basic starvation issue I'm surprised the developers haven't considered.
  • UI and retrieval thread confusion. If Explorer is waiting on a long-running HTTP session, you can't shut it down. The UI thread apparently blocks on the retrieval thread waiting to shut down. Is this nasty COM STA mojo? Threading can be difficult, especially if you're working in C++. But writing multithreaded HTTP clients is pretty close to being a solved problem...

Forget Wet Paint... We're Still Framing the House

ArcGIS Explorer is not even fairly called a beta product. At best it is a developer's demo or a throwaway prototype. Put together some lessons learned, and let the programmers who have built a real Internet application at the thing. Until then, it'll be an awkward subset of ArcGlobe wandering down the information superhighway with a blindfold and a marked hobble.

Metadata: The Eternally Red Herring

KML's shiny <Metadata> tag is the toast of the town, but I think the GIS crowd drank a little too much ESRI Kool-Aid this decade to realize the implications. Paul Ramsey writes:

Where will this all end? I think it will end with the Google Team picking one or a few <Metadata> encodings to expose in their user interfaces (Earth and Maps). At that point all content will converge rapidly on that encoding, and the flexibility of <Metadata> will be rapidly ignored.

The very word "metadata" reeks of sadness. It speaks of librarians locked in dark rooms trying to figure out what data means, long after those who created it are onto the next burning crisis. It smells like scores of sad committee meanings hashing out arcane XML details that no one will ever write a consistent reader or writer for.

It is amazing to me that anyone would want to use the word for anything anymore. But Google did, and people took them at their word. But what Google really meant — in traditional GIS speak — is attribute storage. As they point out in their FAQ article:

Without the <Metadata> tag, KML authors had to either send two files, one with metadata and one with KML, or place their data in the <description> tag

Sound familiar? It's a SHP and DBF file. But now you can marry attribute data to your geometries in one file. And because it can be arbitrary XML, it can be a lot more interesting than boring rows of attributes.

Perhaps only a few big encodings (like a boring row of attributes) will be supported out of the box by Google Earth, but you'll find every XML spec under the sun suddenly glued inside KML documents, and the great sleeping beast of a thousand enterprise applications will suddenly awake from its slumber and devour GIS applications everywhere. Whether you put KML inside a more traditional XML document, or XML inside your KML, it comes to the same thing. It's a smarter, more modern shape file. It's not the usual GIS "metadata" malarkey — it's the real data.