Monday, August 13, 2012

Your Bottleneck is Dead. Long Live Your Bottleneck.

There's an old joke that, if you think about it, you can apply directly to system bottlenecks.

Two hikers are walking through the woods when they come face-to-face with a pack of wolves. One of the hikers immediately drops to the ground and hastily changes from his hiking boots to the running shoes he had in his backpack.

The 2nd hiker says, "What are you doing ! You can't outrun those wolves!"

The 1st replies, "I don't have to outrun those wolves. I just have to outrun you."


Web developers tend to know their system's biggest bottleneck, but how often do you know the 2nd biggest one? Right, in one sense it doesn't matter - because the 2nd bottleneck doesn't get to become a bottleneck unless it's the biggest one.

There is an economic model hidden within every complex system. This includes something as mundane as web system performance. Knuth famously said (or re-said) that "premature optimization is the root of all evil" which could be restated as - if you optimize before you know what needs it, you're optimizing (and probably breaking) the wrong thing.

Hence we don't (or aren't supposed to) optimize what doesn't need it. Seems obvious - but it has interesting ramifications.

When something doesn't need optimizing, we can afford to be (and often tend to be) lazy with it when it comes to performance. Concretely, if your code's database access is going to take a 15 milliseconds, worrying that processing that data will take 20 microseconds because of your sloppy n^2 algorithm probably isn't worth much thought.

If that statement raises your ire, feel free to sit in your chair and pout - because there are thousands of websites that were happily coded using notepad with interpreted and dynamic scripting languages that flagrantly use gotos and lists as if they were hashmaps. I've seen it. It's enough to turn your stomach. It's not pretty.

For the average, everyday web hardware ecosystem - we have CPU power to spare. And in the bigger business sense, if I can save time developing a website cutting performance-concerned corners with no ramifications, all the better.

Web development largely started in scripting languages (i.e. perl-cgi, php). Again for the same reason - CPU to spare as compared to other bottlenecks.

In fact, I'll go so far as to say that the popularity of scripting language web frameworks required the condition that disks be some order of magnitude slower than CPUs. That's right - I'm looking at you Rails, Grails, Nails, and Jails.. (ok, not Jails, it's a Java web framework but it rhymed).

Java web frameworks added a lot of structure, verbosity, and performance that simply wasn't needed (and eventually, amazing bloat). If your bottleneck was the database/disk - your web processing simply had to not add significantly to that - and regardless of the language, that wasn't hard.

A simple definition of latency is the time it takes to get data back after requesting it. Similarly to an program anyway, bandwidth could be viewed as how long it takes us to get all the data requested (once you start getting any).

Think of how that relates to code performance. If your latency is 3ms (a reasonable server harddisk seek time) - it doesn't matter if your code is hand-loved machine language or interpreted COBOL - it does nothing for that 3ms. In CPU time, 3ms is an eternity.

As a general tendency however, the more data you receive, the more processing that likely goes around it. Consider a few megabyte JSON message - at a minimum it will likely be parsed. Possibly shoved into a map or an object.

Said another way - lowering latency and increasing bandwidth will tend to put more pressure on processing data (i.e. requiring more CPU/code performance)

So all this time we're happily and harshly slowed down by slow things like spindly harddisk drives and networks. Then, in walk Solid State Drives. Prices and capacities are both heading in the normal directions for new technology (down and up, respectively).

Latency goes from standard spindle drive 3ms seeks to (varying reports) 100microsecond seeks.

Argue the specifics if you will, but for some number of existing systems, installing an SSD will remove the database as the primary bottleneck. In fact, this is probably the cheapest way to improve your system's performance today.

What happens to the bottleneck in those systems? It will shift somewhere else (i.e. the SSD put on its running shoes). In many cases, it will shift to the CPU (CPU in this case is a polite way of saying "your code").

Everyday across the world, there are meetings at companies complaining about the performance of their website. Today, many of those say "get the DBA in here".

In some of those meetings soon, the shift will be away from blaming the database. Some will push for code optimization (postmature), some for bigger hardware, and some for faster languages.

Keep in mind, this is a subtle, slow moving effect. Having your CPUs pegged all the time might not make you change anything today but may make you reconsider your architecture next time you build something.

Of course the network is a bottleneck too - at least for now. In places like Korea and Kansas City that's not so true. If you haven't heard, if you live in Kansas City you can get Google Fiber to your home. In other words, your internet speed will be 100 times faster than the average internet in the US. (In fact, if your machine has the common SATA2 disk drive interface, sending a file to your neighbor across town in Kansas City will only take about 3 times as long as storing it on your own disk just a few inches away).

Here's another prediction - in 5 years the phrase "downloading a movie" won't exist. (We used to say we were "downloading an image" which was preceded by us saying we were just "downloading").

If bandwidth drastically increases, it will change how we write code. We think nothing of loading a 1M webpage now which 10 years ago was offensive. In the future, we may think the same thing about a 100M webpage.

Given that data expands to fill available bandwidth (modified Parkinson's Law) our programs will tend to process much more data. Processing speed will matter more and more.

And the more often code becomes the bottleneck, the more often solutions to fix that will be considered.

Simply - your favorite bottlenecks might be changing. And for that to happen, your disk doesn't necessarily need to be able outrun your CPU - it just has to be able to outrun your code. (And it wouldn't hurt if it could also outrun, you know, wolves too).

My startup Refresh is looking for awesome IOS and front-end engineers. Join us! Email us at jobs@refresh.io De

7 comments:

eyrieowl said...

I think that statement from Knuth is the root of so much evil in the software industry, largely as a result of what you allude to. It makes people lazy, and then that laziness begets a million other woes. I would not suggest that people should break their abstractions to gain good performance, but I think a good coder should always try to write clean, performant code within the confines of the abstractions they are coding with. Instead, people do all sorts of lazy, expensive things saying, "Eh, but it doesn't really matter." And then, when the biggest bottleneck is removed, there's all this other utter crap that is revealed, and it's so much more difficult to go back and clean that up than to have just written clean, simple, performant code from the get-go.

The lesson, I believe, that you should take from Knuth is not to only care about performance if you're working on the choke-point, but rather to care about performance, outside the choke-point, unless it would impact the legibility of the code or the abstraction. In other terms: define the interfaces, code to them (in a disciplined, non-lazy fashion), optimize performance while keeping the abstractions, if possible, and then if the abstraction is the bottleneck, break the abstraction (coming up with an improved one if possible). If people coded like that, I think the general quality of software would be so much better.

Paddy3118 said...

You seem to be falling into the trap of premature optimization yourself in this post, in that you immediately think that dynamic languages will be the next bottleneck because you intimate that all Perl and all PHP is written badly and is slow; and that all Ruby-on-Rails code is slow.

You fail to mention that a lot of people overcame their inertia of using Java-based frameworks that they were already using and switched to tools built in dynamic languages such tools as Django and Rails, because development and maintenance time was key. When they actually *measured* where their time and money was spent, they had a better understanding of what needed optimizing first and junked their web development framework.

Yes faster secondary storage and smaller network latencies might well change what needs optimizing, but again; you need to mix-n-match, consider your options, and *measure* before blaming any particular part of your stack.

For example, there are companies working on tech that could give you an order of magnitude more cpu cores in a "box". They may individually be less powerful than a modern Xeon core but they could give more bang-per-watt of waste heat which is becoming more important. Will that favour your existing DB? WHo knows? Best to do some research and *measure*!

Max5684 said...

I just wanted to comment to say that this is an excellent post which has now been posted to Reddit. All of the upvotes that it is receiving there should be passed on to you, good sir.

http://www.reddit.com/r/programming/comments/yc099/your_bottleneck_is_dead_long_live_your_bottleneck/

Alex Karaman said...

Very interesting read and some great predictions about the future. Made me think about how I write my code, and the big O, man I haven't thought about it since college.

Anonymous said...

Hello, thanks for writing this. There may be opportunities to write faster code in faster languages, opening up sooner or later, due to the need to make programs work faster. Interesting that Java is too bloated for its own good. I like things to be nimble and quick. But the final paragraph of the article connected enough with the parable at the top sufficiently for me to see the argument.

Anonymous said...

I dunno.. I still find 1Meg webpages fairly offensive especially when there's no real justification.

Jason said...

Client-side databases and javascript multi-threading will be the next step that will make websites appear to load instantaneous! Even with ssd caching (or an SSD database) the latency of the request and sending the information combined with the database access time is still a bottleneck. Why do we send a request to a site, then send the same request for the next page, then change the sort order of the same request and send it again?!
By loading information once, sorting on the client side, and loading subsequent requests in the background with a javascript worker thread, websites can appear to be nearly instantaneous!

This approach can delay re-engineering an existing database & will put network latency to the background. Unlike bandwidth, latency will almost always exist, but by putting requests in the background, it can be hidden.