Friday, August 17, 2007

Mono's VMWare Image: Sometimes a ZIP is not a ZIP

So it finally came down to it: I wanted to run Linux in order to test some Mono interoperability code. We are writing libraries that will be used in .NET/Windows environments as well as C++/Linux environments, and getting the pointer buggery correct is important. We've got a C++ program (with a 25 year pedigree) that needs to start using components we'd just as soon write using .NET.

This is a case where I am not comfortable just testing some things in Mono on OS X because the issue is not managed code, but specifically unmanaged code. OS X has enough of its own skeletons; I don't want to sweat the wrong details. So I've gone off to the Mono site and downloaded their most recent VMWare image. It downloads in 20 minutes or so and unzips in another five. In a strange twist, I found that if I unzipped it using OS X's command line unzip command, the command complained about corrupted zip entries. VMWare started the unzipped mess properly, but nothing quite worked right correctly in the resultant image. I used the GUI unzip (whatever happens when you double-click the ZIP file in Finder) and it worked like a charm. Humph.

So I guess my Linux world for the next few weeks or months will be openSUSE 10.2, Novell's pet distribution. We'll see how it goes. I love that every few years as I play with Linux I have to learn a new package management library.

Wednesday, August 15, 2007

What Can a RESTaurant Teach You About REST? (Whataburger)

We're talking about RESTaurants. Today: one of Texas's great institutions, Whataburger. ("What a burger!")

Mmm... burgers!

Stay focused. We're talking about State Transfer (the ST in reST). So when I'm driving between offices, I often stop by a Whataburger to get some lunch. When I arrive, I need to know how to order my food.

GET /order/ HTTP/1.1 Host: whataburger42.example.com

And Whataburger #42 kindly responds with two links I can follow to make that choice: Drive Through or Dining Room?

HTTP/1.1 200 OK Date: Mon, 23 May 2005 22:38:34 GMT Server: Rosalinda Content-Type: text/html; charset=UTF-8 <ul> <li><a href="/order/drivethru/">Drive Thru. Cars in line: 9</a></li> <li><a href="/order/diningroom/">Dining Room. Cars in parking lot: 14</a></li> </ul>

It's very likely the cache headers on that response would only be for 30 seconds or so—it's a busy Whataburger at lunch time. So this is classic REST: we're using HTTP to retrieve hyperlinks that navigate us through application state. The hungry client (me!) can choose between two options, and the exact method for specifying which one is simple: I follow a link.

You're a Houstonian. Surely you always take the drive through?

Surely. Well, actually in the above case I'd almost certainly go into the dining room. And theirein lies the interesting lesson of the Whataburger concurrency dilemma. The two order pipelines handle state transfer entirely differently, because of concurrency and correctness issues. I won't keep posting silly HTTP transcripts, but you can play along as if I did. So whether I'm in the dining room or the drive through, I need to order my food. We might imagine that a GET of the resources presented above (e.g. /order/drivethru/) returns a <form> which I can POST to in order to create an order. This works in both places, right?

Actually, sort of. And here we get back to the concept of getting it right. If I'm ordering food during the incredibly busy lunch hour, my order goes through a canonicalization. If it's 3.30 in the afternoon, the B-team is on staff and just takes my order and gives me a number. Why? Let's imagine I'm in the dining room and order (POST) something like "Um, I'll take a #1 meal, with onion rings instead of fries, and a drink. Oh, no pickles." At lunch time they can't afford to screw up their pipeline of burgers, so they'll ask some clarification. "Do you want cheese with that?"

Of course, everyone loves cheese.

What does that look like in a REST universe? The restaurant doesn't even need to issue me an order number before they come back with their upsell/correction. So I imagine they'd simply return yet another <form> which I'd need to fill out. Perhaps a <form> to fill out with restricted options: the order you gave me, or the order you gave me with cheese. If I'm in the drivethrough, they always canonicalize the order in a certain form to absolutely minimize confusion. My order becomes a "#1 w/cheese, no pickles, onion rings, diet coke". That's another representation of the same resource (my order), but the server is insisting on canonicaliation because at lunch time getting it wrong is too expensive.

Got it. I'll be ready the very moment cars come standard with an HTTP client.

Don't be a smart ass. This simple transaction (and we're not even done yet!) already elucidates one example of choices for managing state. We POSTed to a URL which gave us another form to interact with. In the real world, the state of that particular application involves Rosalinda—our ever cheerful cashier—and I remembering what we're talking about. If we came back in an hour, we'd have to start at the "Um". But in our REST example, the state of the conversation is entirely contained in the form I got back asking me if I wanted cheese with that. Rosalinda doesn't need to give me an order number or alter her databases.

When does this become relevant to my day job?

Lots of transactions need canonicalization. Geocoding is a great example. We've all typed "800 8th St" into a mapping program and had it answer not with the map we expected, but a form or list of hyperlinks asking whether we meant to ask for that address in Port Arthur, Hempstead or Port Neches. (It could even be an "HTTP 300: Multiple Choices" response, but something tells me that fine a reading of the HTTP spec is some years away.) Those links contain the entire correct canonicalized address, and the server doesn't need to remember it was talking to you. My request for 800 8th St Port Arthur is indistinguishable from the less knuckleheaded person's request who asked for it correctly in the first place. I arrived at the same application state. Yes, I wanted cheese.

That's a good point, you said the drive through and the dining room were different, but all we've talked about is identical canonicalization processes.

Well, both order channels have guided me through their state identically so far. In the dining room, my final POST of a canonical order results in the creation of an order resource: I get a little orange plastic order number: 23. In the REST world, I am told my order now exists at /order/diningroom/23. I can GET the status of that resource as often as I like. Is my burger ready yet? Is my burger ready yet?

Rosalinda's co-worker Randall is happy to tell me as often as I ask that my order is or isn't ready yet. But he tells me immediately. And he's also answering my fellow hungry diner's queries as well: #21, #22, and #24 are all asking. In the dining room, food is served asynchronously. When it's ready, it's ready. It would not be unusual to get my order before #21 if he also ordered a milkshake and biscuits. One of these times, my GET will results in a beautiful (digitally signed!) cheeseburger. (And this is why you choose the dining room over the drive through when the drive through is long. You can get your food in the average time it takes to prepare it under load, not the sum of times it takes the people ahead of you to fill their orders.) After my digitally signed burger is eaten, if I GET at the same resource again, I will probably be told "HTTP 410: Gone" or "HTTP 404: Not found". (Yes, I'm totally ignoring security and the possibility someone will steal my burger by guessing my order number. There are many orthogonal ways to handle that.)

Back in the drive through, the state transfer is totally different. A simple order number and asynchronous handling is not enough. Cars must be served in order. I have to wait in line. My POST to create a new resource won't return a nice URL I can poll on. It will probably block until it returns the digitally signed cheeseburger. I get on a busy web server and had to wait behind other requests. Where was the state? It was all in the server: shuffling connections, building & servicing queues, etc. As far as I was concerned, the application was stateless.

But at what cost?! The server (the drive through) had to maintain an open connection with me that whole time. And remember what my order was. Heavy duty, man. And it's not very scalable. In the dining room, Randall could easily handle dozens of diners asking him where is order was. The diners held onto their own state—he hardly had to remember anything! But in the drive through, cars are waiting in line and waiting in line and my dreams of a fast lunch are shattered when I see the car in front of me ordering 12 burgers for her office lunch.

In return for the simplicity of simply POSTing a blocking call (easier to program—you can leave the air conditioning on), the server takes on a heavy burden. The Whataburger near my office chooses an alternative to asynchronicity in an attempt to scale: they have two drive-through lanes. When I POST my order, I am probably getting an "HTTP 302: Found" or "HTTP 303: See Other" telling me which drive through URL to make my blocking post to (e.g. /order/drivethrough/1).

Now I'm hungry

Not me. I got today's burger at the neighborhood beer joint a few hours ago. They handle their scalability and state transfer issues like Whataburger's dining room: my number today was 8.

Tuesday, August 14, 2007

What Can a RESTaurant Teach You About REST? (part 1)

I'm buying a house soon. That's the sort of transaction you have to get right. No matter what the cool kids are saying about transactions being dead, I'll fight Brewer's conjecture all the way the bank. I want my house, and the seller wants his money. It's definitely not okay for one of us to end up with both. Why? Because it's too hard to correct the error[*]. Therefore we pay the not inconsiderable overhead of title companies, escrow agents, loan officers, wire transfer fees, etc. to get it right. It's way cheaper than going to court.

But in the rarefied world of blogosphere REST pundits, we evangelize the webby way for lots of things. Fire and forget. Assume your communication channel is going to fail a lot. Assume statelessness. This would totally suck for buying a house. Imagine having to bring all 300 pages of documentation required to every meeting you attended—tax returns, site surveys, credit reports. And how could anyone be sure you were being consistent? Boy I would love to have brought a different set of papers to the loan officer as I did to the IRS. (Hello, sub-prime mortgage crisis! But I digress.)

But aren't we talking about RESTaurants? I thought it was a clever pun.

Sorry, yes we were. But the point was to introduce the cost of getting it right. And besides, it's hard to concentrate on anything else when you're buying a house, so indulge me.

What, honestly, is the cost to you if a restaurant fails to get your order right? Mistakes are made all the time by waiters, customers, busboys, managers, and cooks. Yet, unlike closing on a house, you do not sign contracts or involve lawyers when you order food at a restaurant. In fact, you don't even do a credit check; centuries of social convention have blessed us with a system where it is assumed you can pay for dinner and only have to prove it at the end.

Or has it? I once booked an anniversary party for my company at a tony restaurant in Las Vegas. I had to put down a credit card to hold the table. Why there and not at, say, my neighborhood pizza joint? The answer is fairly obvious: if I flake on my big party at said tony restaurant, they're out a private room and four figures. If I flake on dinner with my wife at said pizza joint, they'll fill the table anyway most nights, and if they don't, they're out 20 bucks, tops.

Okay, now you're bugging me. REST pundits are supposed to wax poetically about URL design and resource representations. You know, eBay transactions are really resources, what is the URL of a pixel[**]?—that sort of thing. This is a rambling diatribe about transactions, not REST. You can't fool me! Though you are at last talking about restaurants.

Well, REpresentations are only half of REST. (As measured by the letters they get in the acronym.) State Transfer is pretty damn important in a stateless protocol. Transactions are only one kind of state transfer. And what I'm warming up to talk about (warming up, get it? restaurants? anyone? is this thing on?) is state transfer. How does that latte get to you at Starbucks? My jalapeño sausage at the local barbecue joint? My #1 Meal, cheese, no pickles, onion rings and a Diet Coke at Whataburger? My Kansas City Strip at Delmonico? Getting there involves many state transfers, and each of these restaurants has chosen a different system.

But now you've gone and spent all your time on silly jokes.

So I have. See you tomorrow.

[*] Oh yes, funny story. When I bought my first house, the combination of my naiveté and an under-trained clerk at the title office led to me bringing a personal check for the down payment on my house. It was an average house, but a 20% down payment still made the check five figures. The title company protested that they couldn't be expected to float that kind of money waiting for my check to clear. And besides, we had already signed so much paperwork and gotten everyone in the same room, that it seemed unlikely we would want to do this again in 5 days. I said, "The check won't bounce, and if it does, what's the problem? You know where I live, right?" Cursory examination of the 300 pages of paper signed that afternoon indicated more than a few copies of my old and new addresses. Some uneasy laughter ensued, and that was the end of that. I got lucky. That actually would have been a very expensive error on the title company's part.

[**] Turns out, whole pictures can have URLs. That's an approach I'm pretty sure wouldn't work for satellite imagery. But again, I digress.

The Mac is Back

Tiny post, for anyone else wondering about high CPU utilization and slow or crashing network services on their Mac while running Parallels Desktop for the Mac 3.0. I've been a happy Parallels-er, running a few critical apps on a virtual PC while doing most everything else on my Mini Mac. (Not a speed-demon, but quiet, cool, and dual core.) After the upgrade to 3.0, I found Parallels was routinely sitting on 40% of my CPU, and causing network connectivity, especially in Safari, to totally suck. I had to Force Quit Safari daily. That didn't make sense. A few other people seemed to suggest that file sharing might be the problem. I've disabled file sharing of the Windows folders into the Mac. This is a nice feature, but in practice I don't use it (or the reverse) terribly often. All my permanent files are on a central networked drive which serves both Macs and PCs in my house. Disabled the sharing, and all was better. Perhaps they'll fix this in their next patch and let me use more than 1 CPU in my virtual machine, or I'll have to join the hordes defecting to the newly rich VMWare.

Added 19 Aug 2007: It appears USB support is no picnic for Parallels either. By disabling USB devices when I'm not using them in the VM, CPU usage has dropped quite a bit also. It still sits at 13% when idle, which I find disturbing. Experiments with VMWare are ongoing.