Ruby's All A-Twitter: Time to Reinvent the Database?
Looks like the incessant claims that it's boring to scale with Ruby are being put to the test with the latest thing for the Web 2.0 set, Twitter. It seems in somewhat candid interview one of the Twitter programmers it was revealed that being the flavor of the month was brutal on performance and that scaling wasn't trivial.
Before the ink was dry on the blog article, David Heinemeier Hansson rushed his not-inconsiderable ego to his own blog article poo-pooing Twitter, essentially telling them if they don't like Ruby's scalability then they should improve it their own damn selves, since it's an open source product.
Oh fun times. I have to say I like what Ruby (well, Rails) has done for the web development world. It has done more than any language since LISP to encourage highly abstract and productive programming. And I don't know nearly enough about this particular problem to know whether the developer-with-the-wonderful-popularity-problem or the developer-who-feels-he-is-the-web-messiah is right. But Ruby is slow! And pushing all your state to the database server is not an automatic panacea if you have serious traffic. So the big questions remain for us.
How important are constant factors?. Having your developer tell you that his solution is O(logN) is fine if you're comparing to an O(N) solution. But it's possible for one O(logN) solution to be a thousand times faster than another. Wouldn't it be nice to not have to buy so many servers simply by having a faster implementation even of a highly abstract language? This is one reason I keep hoping F# takes off, since it is extremely well compiled and runs on a solid (and fast) infrastructure.
Is Partitioning A Solomonic Solution?. Everyone says the really big guys like Google and EBay don't use foreign keys or transactions in their databases. The consensus on this particular Ruby foodfight is that twitter should partition their database. But what happens when you cleave the database in two? How often are you going to re-implement those things you were using it for?
I have a feeling we'll be hearing a lot about these two issues in the coming year because they're related. Mainstream developers have finally realized that highly abstract languages help them be productive. (Did it just take someone stumbling on the right syntax? Perhaps that'd be another interesting article.) But this new crop of languages are all aggressively stateless, forcing people to push state to a database. And databases only scale so far. Partitioning is but one ugly trick.
Is it time for someone to invent a serious SQL-compliant database that makes radically different assumptions about the cost of resources? Databases are currently implementing mostly using assumptions baked-in since the 70s: hard drives are a lot slower than memory, but they're the only reliable store. Today is totally different! Hard drives are so much slower than memory as to hardly be worth messing with, fast reliable networking may make disk storage a secondary concern, and even memory is slow compared to CPUs.
Faster implementations of dynamic languages and brand new database engines. Who thought we'd be here in 2007 when as recently as 1997 being 10% slower than hand-coded C++ was a death kiss for your language/project/framework. Fun times.