Tuesday, August 21, 2012

I Further Predict the Death of Your Web Framework

In my last post, I philosophized how new technology is going to change the bottleneck of web (and other) systems (which despite everything else, have remained surprisingly stable for awhile).

This spelled the demise of many systems that relied on a given system bottleneck, specifically, slow runtime systems.

But there is another technological shift conspiring against many web frameworks that isn't focused on performance, but instead focused on "ease of use" - which in many cases may hit far closer to home.

That shift is the reorganization of MVC.

MVC stands for "model-view-controller" which loosely means you have a datastore/database (the model) which is retrieved and manipulated (by the controller) such that it can finally be shown to a user (the view).

That is, pretty much always the flow. Well - kinda. The first thing you might notice is that "MVC" is an out-of-order acronym per the dataflow. In that case it would be "MCV". And happily, given that dataflow is paramount to my story - I'll use that in the rest of this article (that might irk you if you're CDO (which is like "OCD", except in alphabetical order, like it SHOULD BE)).

A predecessor to MCV was a simpler idea of simply "client/server" where client was the view, server was the controller and model (or, some or all of the controller could be in the client too). However, the client in that case actually implied it was a real client - that is a program that received the data and showed it.

In the web, the browser is the client, but interestingly in things like Rails, Jails, Nails, Grails, Struts, Play!, PHP, ASP.Net, and many others the "view" is on the server which then renders HTML and sends that to the browser. As far as the programmer is concerned, the whole MCV is on the server. The browser is often just a dumb terminal.

In the last year or two however, the popularity of a new type of framework is changing all that.

That change is coming from libraries such as backbone.js and ember.js (and many, many others).

Those libraries allow you to render views (not just show, actually render) in the browser itself. In addition, they let you leverage a lot of javascript magic in the browser. This is pretty awesome for several reasons.

The computing power of rendering is moved to the client's machine. Rendering isn't probably your biggest computing expense, but take off that computing cost from your server (times every web request you get) and its measurable.

And as you can imagine, if the "V" of MCV actually migrates to the client, all that's left on the server is "MC" (to be fair, sometimes even part of the "C" goes to the client).

What thousands and thousands of Rails developers discovered upon moving to backbone is that they no longer needed their fancy template views. Their backend became a system that pushed JSON over HTTP.

Very clean and very simple. At my new company Refresh (we're hiring!), our backend pushes the exact same JSON to our webpage as it does to our IOS app. And that same system will someday seamlessly become our API too.

For me, using Rails for webapps over Java (where I spent plenty of time years ago) was a simple decision. ActiveRecord was beautiful and elegant (especially compared to things like Java's hibernate). Also, the view layer was simple, well-laid-out, and standardized. If anything, Java had too many choices.

But these days, I tend to use NoSQL on the backend. And ember on the front-end. All I need in the middle is something to manipulate and push JSON. Why was I paying the Rails tax? (again insert any language that is a multiple slower than Java in that sentence).

I'm not particularly picking on Rails - it is just a full MCV solution that I no longer need. There are plenty of those.

And if you're thinking this is a win for Node.js - you're probably right. With much more javascript coding entering your web framework as a whole, using Node on the backend is probably the winner of all this on the usability front. Javascript on the server isn't the fastest, but it's pretty darn good at manipulating JSON (and thank you to whoever it was that shot XML dead).

So my not-so-amazing prediction is that in a few short years time full web frameworks from any languages disappear. Node picks up some of that slack but so do less feature-ful frameworks (and maybe performant ones). Even non-frameworks altogether get more use.

There's surely no mourning required here. Web frameworks change every few years no matter how you slice it. But between this post and my last, I see two converging fronts out to kill some our most popular ones right now.

Personally, I'm hoping to never server-side render HTML again. I'll let your browser do my rendering while I sit back, chill, and push some JSON.

And yes, Mailinator is in rewrite now to use ember, much to the chagrin of web scraping programs everywhere! (but much to the happy of JSON receivers)

Monday, August 13, 2012

Your Bottleneck is Dead. Long Live Your Bottleneck.

There's an old joke that, if you think about it, you can apply directly to system bottlenecks.

Two hikers are walking through the woods when they come face-to-face with a pack of wolves. One of the hikers immediately drops to the ground and hastily changes from his hiking boots to the running shoes he had in his backpack.

The 2nd hiker says, "What are you doing ! You can't outrun those wolves!"

The 1st replies, "I don't have to outrun those wolves. I just have to outrun you."


Web developers tend to know their system's biggest bottleneck, but how often do you know the 2nd biggest one? Right, in one sense it doesn't matter - because the 2nd bottleneck doesn't get to become a bottleneck unless it's the biggest one.

There is an economic model hidden within every complex system. This includes something as mundane as web system performance. Knuth famously said (or re-said) that "premature optimization is the root of all evil" which could be restated as - if you optimize before you know what needs it, you're optimizing (and probably breaking) the wrong thing.

Hence we don't (or aren't supposed to) optimize what doesn't need it. Seems obvious - but it has interesting ramifications.

When something doesn't need optimizing, we can afford to be (and often tend to be) lazy with it when it comes to performance. Concretely, if your code's database access is going to take a 15 milliseconds, worrying that processing that data will take 20 microseconds because of your sloppy n^2 algorithm probably isn't worth much thought.

If that statement raises your ire, feel free to sit in your chair and pout - because there are thousands of websites that were happily coded using notepad with interpreted and dynamic scripting languages that flagrantly use gotos and lists as if they were hashmaps. I've seen it. It's enough to turn your stomach. It's not pretty.

For the average, everyday web hardware ecosystem - we have CPU power to spare. And in the bigger business sense, if I can save time developing a website cutting performance-concerned corners with no ramifications, all the better.

Web development largely started in scripting languages (i.e. perl-cgi, php). Again for the same reason - CPU to spare as compared to other bottlenecks.

In fact, I'll go so far as to say that the popularity of scripting language web frameworks required the condition that disks be some order of magnitude slower than CPUs. That's right - I'm looking at you Rails, Grails, Nails, and Jails.. (ok, not Jails, it's a Java web framework but it rhymed).

Java web frameworks added a lot of structure, verbosity, and performance that simply wasn't needed (and eventually, amazing bloat). If your bottleneck was the database/disk - your web processing simply had to not add significantly to that - and regardless of the language, that wasn't hard.

A simple definition of latency is the time it takes to get data back after requesting it. Similarly to an program anyway, bandwidth could be viewed as how long it takes us to get all the data requested (once you start getting any).

Think of how that relates to code performance. If your latency is 3ms (a reasonable server harddisk seek time) - it doesn't matter if your code is hand-loved machine language or interpreted COBOL - it does nothing for that 3ms. In CPU time, 3ms is an eternity.

As a general tendency however, the more data you receive, the more processing that likely goes around it. Consider a few megabyte JSON message - at a minimum it will likely be parsed. Possibly shoved into a map or an object.

Said another way - lowering latency and increasing bandwidth will tend to put more pressure on processing data (i.e. requiring more CPU/code performance)

So all this time we're happily and harshly slowed down by slow things like spindly harddisk drives and networks. Then, in walk Solid State Drives. Prices and capacities are both heading in the normal directions for new technology (down and up, respectively).

Latency goes from standard spindle drive 3ms seeks to (varying reports) 100microsecond seeks.

Argue the specifics if you will, but for some number of existing systems, installing an SSD will remove the database as the primary bottleneck. In fact, this is probably the cheapest way to improve your system's performance today.

What happens to the bottleneck in those systems? It will shift somewhere else (i.e. the SSD put on its running shoes). In many cases, it will shift to the CPU (CPU in this case is a polite way of saying "your code").

Everyday across the world, there are meetings at companies complaining about the performance of their website. Today, many of those say "get the DBA in here".

In some of those meetings soon, the shift will be away from blaming the database. Some will push for code optimization (postmature), some for bigger hardware, and some for faster languages.

Keep in mind, this is a subtle, slow moving effect. Having your CPUs pegged all the time might not make you change anything today but may make you reconsider your architecture next time you build something.

Of course the network is a bottleneck too - at least for now. In places like Korea and Kansas City that's not so true. If you haven't heard, if you live in Kansas City you can get Google Fiber to your home. In other words, your internet speed will be 100 times faster than the average internet in the US. (In fact, if your machine has the common SATA2 disk drive interface, sending a file to your neighbor across town in Kansas City will only take about 3 times as long as storing it on your own disk just a few inches away).

Here's another prediction - in 5 years the phrase "downloading a movie" won't exist. (We used to say we were "downloading an image" which was preceded by us saying we were just "downloading").

If bandwidth drastically increases, it will change how we write code. We think nothing of loading a 1M webpage now which 10 years ago was offensive. In the future, we may think the same thing about a 100M webpage.

Given that data expands to fill available bandwidth (modified Parkinson's Law) our programs will tend to process much more data. Processing speed will matter more and more.

And the more often code becomes the bottleneck, the more often solutions to fix that will be considered.

Simply - your favorite bottlenecks might be changing. And for that to happen, your disk doesn't necessarily need to be able outrun your CPU - it just has to be able to outrun your code. (And it wouldn't hurt if it could also outrun, you know, wolves too).

My startup Refresh is looking for awesome IOS and front-end engineers. Join us! Email us at jobs@refresh.io De