The Law of Leaky Abstractions

There’s a key piece of magic in the engineering of the Internet which you rely on every single day. It happens in the TCP protocol, one of the fundamental building blocks of the Internet.

TCP is a way to transmit data that is reliable. By this I mean: if you send a message over a network using TCP, it will arrive, and it won’t be garbled or corrupted.

We use TCP for many things like fetching web pages and sending email. The reliability of TCP is why every email arrives in letter-perfect condition. Even if it’s just some dumb spam.

By comparison, there is another method of transmitting data called IP which is unreliable. Nobody promises that your data will arrive, and it might get messed up before it arrives. If you send a bunch of messages with IP, don’t be surprised if only half of them arrive, and some of those are in a different order than the order in which they were sent, and some of them have been replaced by alternate messages, perhaps containing pictures of adorable baby orangutans, or more likely just a lot of unreadable garbage that looks like that spam you get in a foreign language.

Here’s the magic part: TCP is built on top of IP. In other words, TCP is obliged to somehow send data reliably using only an unreliable tool.

To illustrate why this is magic, consider the following morally equivalent, though somewhat ludicrous, scenario from the real world.

Imagine that we had a way of sending actors from Broadway to Hollywood that involved putting them in cars and driving them across the country. Some of these cars crashed, killing the poor actors. Sometimes the actors got drunk on the way and shaved their heads or got nasal tattoos, thus becoming too ugly to work in Hollywood, and frequently the actors arrived in a different order than they had set out, because they all took different routes. Now imagine a new service called Hollywood Express, which delivered actors to Hollywood, guaranteeing that they would (a) arrive (b) in order (c) in perfect condition. The magic part is that Hollywood Express doesn’t have any method of delivering the actors, other than the unreliable method of putting them in cars and driving them across the country. Hollywood Express works by checking that each actor arrives in perfect condition, and, if he doesn’t, calling up the home office and requesting that the actor’s identical twin be sent instead. If the actors arrive in the wrong order Hollywood Express rearranges them. If a large UFO on its way to Area 51 crashes on the highway in Nevada, rendering it impassable, all the actors that went that way are rerouted via Arizona and Hollywood Express doesn’t even tell the movie directors in California what happened. To them, it just looks like the actors are arriving a little bit more slowly than usual, and they never even hear about the UFO crash.

That is, approximately, the magic of TCP. It is what computer scientists like to call an abstraction: a simplification of something much more complicated that is going on under the covers. As it turns out, a lot of computer programming consists of building abstractions. What is a string library? It’s a way to pretend that computers can manipulate strings just as easily as they can manipulate numbers. What is a file system? It’s a way to pretend that a hard drive isn’t really a bunch of spinning magnetic platters that can store bits at certain locations, but rather a hierarchical system of folders-within-folders containing individual files that in turn consist of one or more strings of bytes.

Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn’t, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can’t do anything about it and your message doesn’t arrive. If you were curt with the system administrators in your company and they punished you by plugging you into an overloaded hub, only some of your IP packets will get through, and TCP will work, but everything will be really slow.

This is what I call a leaky abstraction. TCP attempts to provide a complete abstraction of an underlying unreliable network, but sometimes, the network leaks through the abstraction and you feel the things that the abstraction can’t quite protect you from. This is but one example of what I’ve dubbed the Law of Leaky Abstractions:

All non-trivial abstractions, to some degree, are leaky.

Abstractions fail. Sometimes a little, sometimes a lot. There’s leakage. Things go wrong. It happens all over the place when you have abstractions. Here are some examples.

  • Something as simple as iterating over a large two-dimensional array can have radically different performance if you do it horizontally rather than vertically, depending on the “grain of the wood” — one direction may result in vastly more page faults than the other direction, and page faults are slow. Even assembly programmers are supposed to be allowed to pretend that they have a big flat address space, but virtual memory means it’s really just an abstraction, which leaks when there’s a page fault and certain memory fetches take way more nanoseconds than other memory fetches.
  • The SQL language is meant to abstract away the procedural steps that are needed to query a database, instead allowing you to define merely what you want and let the database figure out the procedural steps to query it. But in some cases, certain SQL queries are thousands of times slower than other logically equivalent queries. A famous example of this is that some SQL servers are dramatically faster if you specify “where a=b and b=c and a=c” than if you only specify “where a=b and b=c” even though the result set is the same. You’re not supposed to have to care about the procedure, only the specification. But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster.
  • Even though network libraries like NFS and SMB let you treat files on remote machines “as if” they were local, sometimes the connection becomes very slow or goes down, and the file stops acting like it was local, and as a programmer you have to write code to deal with this. The abstraction of “remote file is the same as local file” leaks. Here’s a concrete example for Unix sysadmins. If you put users’ home directories on NFS-mounted drives (one abstraction), and your users create .forward files to forward all their email somewhere else (another abstraction), and the NFS server goes down while new email is arriving, the messages will not be forwarded because the .forward file will not be found. The leak in the abstraction actually caused a few messages to be dropped on the floor.
  • C++ string classes are supposed to let you pretend that strings are first-class data. They try to abstract away the fact that strings are hard and let you act as if they were as easy as integers. Almost all C++ string classes overload the + operator so you can write s + “bar” to concatenate. But you know what? No matter how hard they try, there is no C++ string class on Earth that will let you type “foo” + “bar”, because string literals in C++ are always char*’s, never strings. The abstraction has sprung a leak that the language doesn’t let you plug. (Amusingly, the history of the evolution of C++ over time can be described as a history of trying to plug the leaks in the string abstraction. Why they couldn’t just add a native string class to the language itself eludes me at the moment.)
  • And you can’t drive as fast when it’s raining, even though your car has windshield wipers and headlights and a roof and a heater, all of which protect you from caring about the fact that it’s raining (they abstract away the weather), but lo, you have to worry about hydroplaning (or aquaplaning in England) and sometimes the rain is so strong you can’t see very far ahead so you go slower in the rain, because the weather can never be completely abstracted away, because of the law of leaky abstractions.

One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. When I’m training someone to be a C++ programmer, it would be nice if I never had to teach them about char*’s and pointer arithmetic. It would be nice if I could go straight to STL strings. But one day they’ll write the code “foo” + “bar”, and truly bizarre things will happen, and then I’ll have to stop and teach them all about char*’s anyway. Or one day they’ll be trying to call a Windows API function that is documented as having an OUT LPTSTR argument and they won’t be able to understand how to call it until they learn about char*’s, and pointers, and Unicode, and wchar_t’s, and the TCHAR header files, and all that stuff that leaks up.

In teaching someone about COM programming, it would be nice if I could just teach them how to use the Visual Studio wizards and all the code generation features, but if anything goes wrong, they will not have the vaguest idea what happened or how to debug it and recover from it. I’m going to have to teach them all about IUnknown and CLSIDs and ProgIDS and … oh, the humanity!

In teaching someone about ASP.NET programming, it would be nice if I could just teach them that they can double-click on things and then write code that runs on the server when the user clicks on those things. Indeed ASP.NET abstracts away the difference between writing the HTML code to handle clicking on a hyperlink (<a>) and the code to handle clicking on a button. Problem: the ASP.NET designers needed to hide the fact that in HTML, there’s no way to submit a form from a hyperlink. They do this by generating a few lines of JavaScript and attaching an onclick handler to the hyperlink. The abstraction leaks, though. If the end-user has JavaScript disabled, the ASP.NET application doesn’t work correctly, and if the programmer doesn’t understand what ASP.NET was abstracting away, they simply won’t have any clue what is wrong.

The law of leaky abstractions means that whenever somebody comes up with a wizzy new code-generation tool that is supposed to make us all ever-so-efficient, you hear a lot of people saying “learn how to do it manually first, then use the wizzy tool to save time.” Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.

And all this means that paradoxically, even as we have higher and higher level programming tools with better and better abstractions, becoming a proficient programmer is getting harder and harder.

During my first Microsoft internship, I wrote string libraries to run on the Macintosh. A typical assignment: write a version of strcat that returns a pointer to the end of the new string. A few lines of C code. Everything I did was right from K&R — one thin book about the C programming language.

Today, to work on CityDesk, I need to know Visual Basic, COM, ATL, C++, InnoSetup, Internet Explorer internals, regular expressions, DOM, HTML, CSS, and XML. All high level tools compared to the old K&R stuff, but I still have to know the K&R stuff or I’m toast.

Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we’ve created over the years do allow us to deal with new orders of complexity in software development that we didn’t have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks. And when you need to hire a programmer to do mostly VB programming, it’s not good enough to hire a VB programmer, because they will get completely stuck in tar every time the VB abstraction leaks.

The Law of Leaky Abstractions is dragging us down.

Platforms

Most software developers, including Fog Creek Software, are perfectly happy just to write software applications. You know, programs that do something or solve some particular problem. But the brave among us want to change the world more significantly, and they choose to work on platforms: big giant slabs of software that don’t quite do anything out of the box, but which enable a whole world of new and interesting applications. So they write operating systems, or DBMSs, or language runtimes like Java, and they hope to attract independent software developers to create the great new applications that their platforms enable.

Almost by definition, an operating system is a platform. Many platforms live on top of operating systems, such as the Java Runtime. And don’t forget that Windows didn’t start out as an OS, it started out as a program you ran on DOS which (out of the box) didn’t do much of anything, but enabled software developers to create GUI applications for inexpensive Intel boxes.

It’s really, really important to figure out if your product is a platform or not, because platforms need to be marketed in a very different way to be successful. That’s because a platform needs to appeal to developers first and foremost, not end users.

I just had the good fortune to read an advance copy of Rick Chapman’s excellent new book on stupidity in the software industry. (Prediction: best seller). Being an analytical kind of guy, I look for common themes. One of the biggest themes in software industry failures is a platform vendor that didn’t understand that they were a platform vendor, so they alienated their key constituency: the developers.

Example: NetWare took so long to release reasonable tools for creating NLMs that when Unix and Windows NT came along with superior and cheaper development tools, they swept away the mindshare of server software developers.

Example: Apple has spent decades making life miserable for their developers. Every new OS for almost 20 years required tweaks and changes to application code. If you got too successful, Apple competed against you (although sometimes they had a hand puppet called Claris compete against you so they could pretend it wasn’t them.)

Example: Developing software for OS/2 1.0 required an investment of $3000 for the SDK, and you had to write all your own printer drivers if printing was important for you. Printing was important, so OS/2 had no applications.

But the counter-examples are just as interesting:

Example: The first versions of Windows included a freely redistributable runtime so that if you wrote a Windows application, you could sell it to anyone with DOS, you weren’t limited to the few weirdo dorks (me!) who bought Windows 1.0.

Example: Despite the mistakes that Sun made with Java, the runtime was always free and good Java tools were cheap or free, too. No other development platform became so predominant so quickly (even Visual Basic, the top selling computer language of all time, took years to ramp up.)

If you want a platform to be successful, you need massive adoption, and that means you need developers to develop for it. The best way to kill a platform is to make it hard for developers to build on it. Most of the time, this happens because platform companies either don’t know that they have a platform (they think it’s an application) or they get greedy (they want all the revenue for themselves.)

Greedy platform companies can’t stand the idea of all kinds of unwashed riffraff making money off of their platform, so they make it darn near impossible for anyone to develop for it. Probably the most spectacular failure illustrating this was IBM’s PS/2, with its huge portfolio of proprietary technologies, such as the new Microchannel architecture designed to insure that only IBM could make expansion cards. This is, of course, monumentally shortsighted. Nobody wanted PS/2s because, uh, add-in cards weren’t available and they were too expensive when they were. As a platform vendor, you’re only as successful as the people who build on you.

A more subtle problem is when platform vendors don’t think they have a platform, they think they have an application. To illustrate this I’m once again going to have to pick on Groove.

“Why do you keep picking on Groove, Joel?” Three reasons:

  • They have an interesting architecture that provides important platform functionality which I could really use in my own products,
  • They make it impossible (or at least unrealistic) for ISVs to build on their platform, thus dooming themselves to oblivion, either out of greed or because they think Groove is an application, not a platform,
  • and Groove inventor Ray Ozzie has a weblog, so he can answer me if he thinks I’m off the mark. (He did.)

Here’s how I noticed the Groove problem. I’ve got a product, CityDesk, for simple desktop web content management. Weblogs, company sites, small organizations, etc. — people who need content management but can’t afford the big systems, don’t control a server anywhere, or just can’t be bothered to mess with installing perl scripts on a server.

As a 1.0 product we’ve got our share of weaknesses. One of the big ones is that people want to collaborate on CityDesk sites over the Internet. A reasonable request. For the next major release, we have to do something about this limitation. Basically, we’ve got two choices. The traditional choice would be to do something client-server: make a CityDesk server you can install somewhere so that everyone can collaborate.

But another choice, which would maintain CityDesk’s benefit of not requiring anything on the server, would be to use a secure peer-to-peer architecture. Something exactly like what Groove provides.

So I thought of porting CityDesk to Groove. Then I noticed that

  1. there is no free Groove runtime. Every single one of my customers would have to buy Groove.
  2. nobody has Groove yet.

That doomed the Groove idea from the start. I talked to some of the Groove “partners” who allegedly are developing software for Groove. “Was the Groove relationship worth anything?” I asked. “HA!” they said. “We paid $1500 and in exchange we get less than 10 clickthroughs a month from their web page. A waste of money. We couldn’t even get Groove to share their customer lists with us.”

This is not the kind of platform I want to develop for. Yet, technologically, it is exactly the kind of platform I want to develop for, but it’s controlled by a greedy (or clueless) company that is going to choke off their own oxygen — the compelling applications on top of Groove. Ray Ozzie is going nuts about how cool weblogs are — where’s the weblog application for Groove? Who’s going to write one? Evan Williams, creator of Blogger? Even Blogger Pro is only $35 a year and that doesn’t go very far to paying for a $99 Groove single user license.

What would happen if the Groove runtime were free? Follow the arc of Windows. It started out with a free runtime that let you run one GUI application at a time. Eventually, many people bought the full version to get the benefits of Windows file management, cut and paste between Windows applications, etc. Then Windows 3.0 came out and was so popular and had so many applications that it was bundled with every PC. Today Windows is like the television tax in Britain. Everybody pays it except the developer — when you’re writing software for Windows, it doesn’t cost one extra cent. In fact at no time in the history of Windows did developers ever have to worry about the cost of Windows itself.

Anybody who ever tried to sell software components (ActiveX controls, beans, etc.) knows that you have to have a royalty-free runtime or no developer will touch you with a 10 foot pole. Microsoft even lets you redistribute Jet, a complete relational database engine that is 9/10ths of Microsoft Access, for free. Heck, it’s preinstalled on Windows 2000.

If Groove wants to be a successful platform, they need to do the same thing. A free Groove redistributable would mean that hundreds of applications would spring up which would get the runtime spread far and wide. Many of those users would see the value in buying a full version of Groove with built in collaboration features. Groove services like the hosted reflectors would have a much larger potential audience.

Of course, they can continue down the path of Notes, assuming that the only way to sell software is to wow CEOs with cool Powerpoints and sell $1,000,000 corporate shelfware licenses. Eventually, this made some real money for Lotus, because Notes had one compelling application — email — built in. But imagine if the Notes runtime had been free. If Notes had a software industry sitting on top of it way back in the 80s, some dormroom startup might have made a compelling hypertext application for it instead and preempted the Web. The dreams of huge public Notes networks might have come true. Notes would be as common on PCs as Solitaire. Today it’s just Another Email System, one without much of a future.

I keep picking on Groove, but only because there’s something interesting in there, one of the few technologies that’s interesting enough to care about. Yes, the Groove engineers are architecture astronauts. That’s OK. They’re building architecture. But they’re positioning it like an application and I don’t think Groove will be successful if they do that. Someone else will come along with a P2P architecture and sell it like a component, or make it an open source library (yes, I’m aware of JXTA), and that’s what software developers will use.

Strategy Letter V

When I was in college I took two intro economics courses: macroeconomics and microeconomics. Macro was full of theories like “low unemployment causes inflation” that never quite stood up to reality. But the micro stuff was both cool and useful. It was full of interesting concepts about the relationships between supply and demand that really did work. For example, if you have a competitor who lowers their prices, the demand for your product will go down unless you match them.

In today’s episode, I’ll show how one of those concepts explains a lot about some familiar computer companies. Along the way, I noticed something interesting about open source software, which is this: most of the companies spending big money to develop open source software are doing it because it’s a good business strategy for them, not because they suddenly stopped believing in capitalism and fell in love with freedom-as-in-speech.

Every product in the marketplace has substitutes and complements. A substitute is another product you might buy if the first product is too expensive. Chicken is a substitute for beef. If you’re a chicken farmer and the price of beef goes up, the people will want more chicken, and you will sell more.

A complement is a product that you usually buy together with another product. Gas and cars are complements. Computer hardware is a classic complement of computer operating systems. And babysitters are a complement of dinner at fine restaurants. In a small town, when the local five star restaurant has a two-for-one Valentine’s day special, the local babysitters double their rates. (Actually, the nine-year-olds get roped into early service.)

All else being equal, demand for a product increases when the prices of its complements decrease.

Let me repeat that because you might have dozed off, and it’s important. Demand for a product increases when the prices of its complements decrease. For example, if flights to Miami become cheaper, demand for hotel rooms in Miami goes up — because more people are flying to Miami and need a room. When computers become cheaper, more people buy them, and they all need operating systems, so demand for operating systems goes up, which means the price of operating systems can go up.

At this point, it’s pretty common for people to try to confuse things by saying, “aha! But Linux is FREE!” OK. First of all, when an economist considers price, they consider the total price, including some intangible things like the time it takes to set up, reeducate everyone, and convert existing processes. All the things that we like to call “total cost of ownership.”

Secondly, by using the free-as-in-beer argument, these advocates try to believe that they are not subject to the rules of economics because they’ve got a nice zero they can multiply everything by. Here’s an example. When Slashdot asked Linux developer Moshe Bar if future Linux kernels would be compatible with existing device drivers, he said that they didn’t need to. “Proprietary software goes at the tariff of US$ 50-200 per line of debugged code. No such price applies to OpenSource software.” Moshe goes on to claim that it’s OK for every Linux kernel revision to make all existing drivers obsolete, because the cost of rewriting all those existing drivers is zero. This is completely wrong. He’s basically claiming that spending a small amount of programming time making the kernel backwards compatible is equivalent to spending a huge amount of programming time rewriting every driver, because both numbers are multiplied by their “cost,” which he believes to be zero. This is a prima facie fallacy. The thousands or millions of developer hours it takes to revise every existing device driver are going to have to come at the expense of something. And until that’s done, Linux will be once again handicapped in the marketplace because it doesn’t support existing hardware. Wouldn’t it be better to use all that “zero cost” effort making Gnome better? Or supporting new hardware?

Debugged code is NOT free, whether proprietary or open source. Even if you don’t pay cash dollars for it, it has opportunity cost, and it has time cost. There is a finite amount of volunteer programming talent available for open source work, and each open source project competes with each other open source project for the same limited programming resource, and only the sexiest projects really have more volunteer developers than they can use. To summarize, I’m not very impressed by people who try to prove wild economic things about free-as-in-beer software, because they’re just getting divide-by-zero errors as far as I’m concerned.

Eazel closed.Open source is not exempt from the laws of gravity or economics. We saw this with Eazel, ArsDigita, The Company Formerly Known as VA Linux and a lot of other attempts. But something is still going on which very few people in the open source world really understand: a lot of very large public companies, with responsibilities to maximize shareholder value, are investing a lot of money in supporting open source software, usually by paying large teams of programmers to work on it. And that’s what the principle of complements explains.

Once again: demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods. So:

Smart companies try to commoditize their products’ complements.

If you can do this, demand for your product will increase and you will be able to charge more and make more.

When IBM designed the PC architecture, they used off-the-shelf parts instead of custom parts, and they carefully documented the interfaces between the parts in the (revolutionary) IBM-PC Technical Reference Manual. Why? So that other manufacturers could join the party. As long as you match the interface, you can be used in PCs. IBM’s goal was to commoditize the add-in market, which is a complement of the PC market, and they did this quite successfully. Within a short time scrillions of companies sprung up offering memory cards, hard drives, graphics cards, printers, etc. Cheap add-ins meant more demand for PCs.

When IBM licensed the operating system PC-DOS from Microsoft, Microsoft was very careful not to sell an exclusive license. This made it possible for Microsoft to license the same thing to Compaq and the other hundreds of OEMs who had legally cloned the IBM PC using IBM’s own documentation. Microsoft’s goal was to commoditize the PC market. Very soon the PC itself was basically a commodity, with ever decreasing prices, consistently increasing power, and fierce margins that make it extremely hard to make a profit. The low prices, of course, increase demand. Increased demand for PCs meant increased demand for their complement, MS-DOS. All else being equal, the greater the demand for a product, the more money it makes for you. And that’s why Bill Gates can buy Sweden and you can’t.

This year Microsoft’s trying to do it again: their new game console, the XBox, uses commodity PC hardware instead of custom parts. The theory (explained in this book) was that commodity hardware gets cheaper every year, so the XBox could ride down the prices. Unfortunately it seems to have backfired: apparently commodity PC hardware has already been squeezed down to commodity prices, and so the price of making an XBox isn’t declining as fast as Microsoft would like. The other part of Microsoft’s XBox strategy was to use DirectX, a graphics library that can be used to write code that runs on all kinds of video chips. The goal here is to make the video chip a commodity, to lower its price, so that more games are sold, where the real profits occur. And why don’t the video chip vendors of the world try to commoditize the games, somehow? That’ s a lot harder. If the game Halo is selling like crazy, it doesn’t really have any substitutes. You’re not going to go to the movie theatre to see Star Wars: Attack of the Clones and decide instead that you would be satisfied with a Woody Allen movie. They may both be great movies, but they’re not perfect substitutes. Now: who would you rather be, a game publisher or a video chip vendor?

Commoditize your complements.

Understanding this strategy actually goes a long, long way in explaining why many commercial companies are making big contributions to open source. Let’s go over these.

Headline: IBM Spends Millions to Develop Open Source Software.

Myth: They’re doing this because Lou Gerstner read the GNU Manifesto and decided he doesn’t actually like capitalism.

Reality: They’re doing this because IBM is becoming an IT consulting company. IT consulting is a complement of enterprise software. Thus IBM needs to commoditize enterprise software, and the best way to do this is by supporting open source. Lo and behold, their consulting division is winning big with this strategy.

Headline: Netscape Open Sources Their Web Browser.

Myth: They’re doing this to get free source code contributions from people in cybercafes in New Zealand.

Reality: They’re doing this to commoditize the web browser.

This has been Netscape’s strategy from day one. Have a look at the very first Netscape press release: the browser is “freeware.” Netscape gave away the browser so they could make money on servers. Browsers and servers are classic complements. The cheaper the browsers, the more servers you sell. This was never as true as it was in October 1994. (Netscape was actually surprised when MCI came in the door and dumped so much money in their laps that they realized they could make money off of the browser, too. This wasn’t required by the business plan.)

When Netscape released Mozilla as Open Source, it was because they saw an opportunity to lower the cost of developing the browser. So they could get the commodity benefits at a lower cost.

Later AOL/Time Warner acquired Netscape. The server software, which was supposed to be the beneficiary of commodity browsers, wasn’t doing all that well, and was jettisoned. Now: why would AOL/Time Warner continue to invest anything in open source?

AOL/Time Warner is an entertainment company. Entertainment companies are the complement of entertainment delivery platforms of all types, including web browsers. This giant conglomerate’s strategic interest is to make entertainment delivery – web browsers – a commodity for which nobody can charge money.

My argument is a little bit tortured by the fact that Internet Explorer is free-as-in-beer. Microsoft wanted to make web browsers a commodity, too, so they can sell desktop and server operating systems. They went a step further and delivered a collection of components which anyone could use to throw together a web browser. Neoplanet, AOL, and Juno used these components to build their own web browsers. Given that IE is free, what is the incentive for Netscape to make the browser “even cheaper”? It’s a preemptive move. They need to prevent Microsoft getting a complete monopoly in web browsers, even free web browsers, because that would theoretically give Microsoft an opportunity to increase the cost of web browsing in other ways — say, by increasing the price of Windows.

(My argument is even more shaky because it’s pretty clear that Netscape in the days of Barksdale didn’t exactly know what it was doing. A more likely explanation for what Netscape did is that upper management was technologically inept, and they had no choice but to go along with whatever scheme the developers came up with. The developers were hackers, not economists, and only coincidentally came up with a scheme which serves their strategy. But let’s give them the benefit of the doubt.)

Headline: Transmeta Hires Linus, Pays Him To Hack on Linux.

Myth: They just did it to get publicity. Would you have heard of Transmeta otherwise?

Reality: Transmeta is a CPU company. The natural complement of a CPU is an operating system. Transmeta wants OSs to be a commodity.

Headline: Sun and HP Pay Ximian To Hack on Gnome.

Myth: Sun and HP are supporting free software because they like Bazaars, not Cathedrals.

Reality: Sun and HP are hardware companies. They make boxen. In order to make money on the desktop, they need for windowing systems, which are a complement of desktop computers, to be a commodity. Why don’t they take the money they’re paying Ximian and use it to develop a proprietary windowing system? They tried this (Sun had NeWS and HP had New Wave), but these are really hardware companies at heart with pretty crude software skills, and they need windowing systems to be a cheap commodity, not a proprietary advantage which they have to pay for. So they hired the nice guys at Ximian to do this for the same reason that Sun bought Star Office and open sourced it: to commoditize software and make more money on hardware.

Headline: Sun Develops Java; New “Bytecode” System Means Write Once, Run Anywhere.

The bytecode idea is not new — programmers have always tried to make their code run on as many machines as possible. (That’s how you commoditize your complement). For years Microsoft had its own p-code compiler and portable windowing layer which let Excel run on Mac, Windows, and OS/2, and on Motorola, Intel, Alpha, MIPS and PowerPC chips. Quark has a layer which runs Macintosh code on Windows. The C programming language is best described as a hardware-independent assembler language. It’s not a new idea to software developers.

If you can run your software anywhere, that makes hardware more of a commodity. As hardware prices go down, the market expands, driving more demand for software (and leaving customers with extra money to spend on software which can now be more expensive.)

Sun’s enthusiasm for WORA is, um, strange, because Sun is a hardware company. Making hardware a commodity is the last thing they want to do.

Oooooooooooooooooooooops!

Sun is the loose cannon of the computer industry. Unable to see past their raging fear and loathing of Microsoft, they adopt strategies based on anger rather than self-interest. Sun’s two strategies are (a) make software a commodity by promoting and developing free software (Star Office, Linux, Apache, Gnome, etc), and (b) make hardware a commodity by promoting Java, with its bytecode architecture and WORA. OK, Sun, pop quiz: when the music stops, where are you going to sit down? Without proprietary advantages in hardware or software, you’re going to have to take the commodity price, which barely covers the cost of cheap factories in Guadalajara, not your cushy offices in Silicon Valley.

“But Joel!” Jared says. “Sun is trying to commoditize the operating system, like Transmeta, not the hardware.” Maybe, but the fact that Java bytecode also commoditizes the hardware is some pretty significant collateral damage to sustain.

An important thing you notice from all these examples is that it’s easy for software to commoditize hardware (you just write a little hardware abstraction layer, like Windows NT’s HAL, which is a tiny piece of code), but it’s incredibly hard for hardware to commoditize software. Software is not interchangable, as the StarOffice marketing team is learning. Even when the price is zero, the cost of switching from Microsoft Office is non-zero. Until the switching cost becomes zero, desktop office software is not truly a commodity. And even the smallest differences can make two software packages a pain to switch between. Despite the fact that Mozilla has all the features I want and I’d love to use it if only to avoid the whack-a-mole pop-up-ad game, I’m too used to hitting Alt+D to go to the address bar. So sue me. One tiny difference and you lose your commodity status. But I’ve pulled hard drives out of IBM computers and slammed them into Dell computers and, boom, the system comes up perfectly and runs as if it were still in the old computer.

Amos Michelson, the CEO of Creo, told me that every employee in his firm is required to take a course in what he calls “economic thinking.” Great idea. Even simple concepts in basic microeconomics go a long way to understanding some of the fundamental shifts going on today.

 

Five Worlds

Something important is almost never mentioned in all the literature about programming and software development, and as a result we sometimes misunderstand each other.

You’re a software developer. Me too. But we may not have the same goals and requirements. In fact there are several different worlds of software development, and different rules apply to different worlds.

You read a book about UML modeling, and nowhere does it say that it doesn’t make sense for programming device drivers. Or you read an article saying that “the 20MB runtime [required for .NET] is a NON issue” and it doesn’t mention the obvious: if you’re trying to write code for a 32KB ROM on a pager it very much is an issue!

I think there are five worlds here, sometimes intersecting, often not. The five are:

  1. Shrinkwrap
  2. Internal
  3. Embedded
  4. Games
  5. Throwaway

When you read the latest book about Extreme Programming, or one of Steve McConnell’s excellent books, or Joel on Software, or Software Development magazine, you see a lot of claims about how to do software development, but you hardly ever see any mention of what kind of development they’re talking about, which is unfortunate, because sometimes you need to do things differently in different worlds.

Let’s go over the categories briefly.

Shrinkwrap is software that needs to be used “in the wild” by a large number of people. It may be actually wrapped in cellophane and sold at CompUSA, or it may be downloaded over the Internet. It may be commercial or shareware or open source or GNU or whatever — the main point here is software that will be installed and used by thousands or millions of people.

Shrinkwrap has special problems which derive from two special properties:

  • Since it has so many users who often have alternatives, the user interface needs to be easier than average in order to achieve success.
  • Since it runs on so many computers, the code must be unusually resilient to variations between computers. Last week someone emailed me about a bug in CityDesk which only appears in Polish Windows, because of the way that operating system uses Right-Alt to enter special characters. We tested Windows 95, 95OSR2, 98, 98SE, Me, NT 4.0, Win 2000, and Win XP. We tested with IE 5.01, 5.5, or 6.0 installed. We tested US, Spanish, French, Hebrew, and Chinese Windows. But we hadn’t quite gotten around to Polish yet.

There are three major variations of shrinkwrap. Open Source software is often developed without anyone getting paid to develop it, which changes the dynamics a lot. For example, things that are not considered “fun” often don’t get done in an all-volunteer team, and, as Matthew Thomas points out eloquently, this can hurt usability. Development is much more likely to be geographically dispersed, which results in a radically different quality of team communication. It’s rare in the open source world to have a face to face conversation around a whiteboard drawing boxes and arrows, so the kind of design decisions which benefit from drawing boxes and arrows are usually decided poorly on such projects. As a result geographically dispersed teams have done far better at cloning existing software where little or no design is required.

Consultingware is a variant of shrinkwrap which requires so much customization and installation that you need an army of consultants to install it, at outrageous cost. CRM and CMS packages often fall in this category. One gets the feeling that they don’t actually do anything, they are just an excuse to get an army of consultants in the door billing at $300/hour. Although consultingware is disguised as shrinkwrap, the high cost of an implementation means this is really more like internal software.

Commercial web based software such as Salesforce.com or even the more garden variety eBay still needs to be easy to use and run on many browsers. Although the developers have the luxury of (at least) some control over the “deployment” environment — the computers in the data center — they have to deal with a wide variety of web browsers and a large number of users so I consider this basically a variation of shrinkwrap.

Internal software only has to work in one situation on one company’s computers. This makes it a lot easier to develop. You can make lots of assumptions about the environment under which it will run. You can require a particular version of Internet Explorer, or Microsoft Office, or Windows. If you need a graph, let Excel build it for you; everybody in our department has Excel. (But try that with a shrinkwrap package and you eliminate half of your potential customers.)

Here usability is a lower priority, because a limited number of people need to use the software, and they don’t have any choice in the matter, and they will just have to deal with it. Speed of development is more important. Because the value of the development effort is spread over only one company, the amount of development resources that can be justified is significantly less. Microsoft can afford to spend $500,000,000 developing an operating system that’s only worth about $80 to the average person. But when Detroit Edison develops an energy trading platform, that investment must make sense for a single company. To get a reasonable ROI you can’t spend as much as you would on shrinkwrap. So sadly lots of internal software sucks pretty badly.

Embedded Software has the unique property that it goes in a piece of hardware and in almost every case can never be updated. This is a whole different world, here. The quality requirements are much higher, because there are no second chances. You may be dealing with a processor that runs dramatically more slowly than the typical desktop processor, so you may spend a lot of time optimizing. Fast code is more important than elegant code. The input and output devices available to you may be limited. Picture of Hertz Neverlost GPSThe GPS system in the car I rented last week had such pathetic I/O that the usability was dismal. Have you ever tried to input an address on one of these things? They displayed a “keyboard” on screen and you had to use the directional arrows to choose letters from five small matrices of 9 letters each. (Follow the link for more illustrations of this UI. The GPS in my own car has a touch screen which makes the UI dramatically better. But I digress).

Games are unique for two reasons. First, the economics of game development are hit-oriented. Some games are hits, many more games are failures, and if you want to make money on game software you recognize this and make sure that you have a portfolio of games so that the blockbuster hit makes up for the losses on the failures. This is more like movies than software.

The bigger issue with the development of games is that there’s only one version. Once your users have played through Duke Nukem 3D, they are not going to upgrade to Duke Nukem 3.1D just to get some bug fixes and new weapons. With some exceptions, once somebody has played the game to the end, it’s boring to play it again. So games have the same quality requirements as embedded software and an incredible financial imperative to get it right the first time. Shrinkwrap developers have the luxury of knowing that if 1.0 doesn’t meet people’s needs and doesn’t sell, maybe 2.0 will.

Finally Throwaway code is code that you create temporarily solely for the purpose of obtaining something else, which you never need to use again once you obtain that thing. For example, you might write a little shell script that massages an input file that you got into the format you need it for some other purpose, and this is a one time operation.

There are probably other kinds of software development that I’m forgetting.

Here’s an important thing to know. Whenever you read one of those books about programming methodologies written by a full time software development guru/consultant, you can rest assured that they are talking about internal, corporate software development. Not shrinkwrapped software, not embedded software, and certainly not games. Why? Because corporations are the people who hire these gurus. They’re paying the bill. (Trust me, id software is not about to hire Ed Yourdon to talk about structured analysis.)

Last week Kent Beck made a claim that you don’t really need bug tracking databases when you’re doing Extreme Programming, because the combination of pair programming (with persistent code review) and test driven development (guaranteeing 100% code coverage of the automated tests) means you hardly ever have bugs. That didn’t sound right to me. I looked in our own bug tracking database here at Fog Creek to see what kinds of bugs were keeping it busy.

Lo and behold, I discovered that very few of the bugs in there would have been discovered with pair programming or test driven development. Many of our “bugs” are really what XP calls stories — basically, just feature requests. We’re using the bug tracking system as a way of remembering, prioritizing, and managing all the little improvements and big features we want to implement.

A lot of the other bugs were only discovered after much use in the field. The Polish keyboard thing. There’s no way pair programming was going to find that. And logical mistakes that never occurred to us in the way that different features work together. The larger and more complex a program, the more interactions between the features that you don’t think about. A particular unlikely sequence of characters ({${?, if you must know) that confuses the lexer. Some ftp servers produce an error when you delete a file that doesn’t exist (our ftp server does not complain so this never occurred to us.)

I carefully studied every bug. Out of 106 bugs we fixed for the service pack release of CityDesk, exactly 5 of them could have been prevented through pair programming or test driven design. We actually had more bugs that we knew about and thought weren’t important (only to be corrected by our customers!) than bugs that could have been caught by XP methods.

But Kent is right, for other types of development. For most corporate development applications, none of these things would be considered a bug. Program crashes on invalid input? Run it again, and this time watch your {${?’s! And we only have One Kind of FTP server and nobody in the whole company uses Polish Windows.

Most things in software development are the same no matter what kind of project you’re working on, but not everything. When somebody tells you about methodology, think about how it applies to the work you’re doing. Think about where the person is coming from. Steve McConnell, Steve Maguire, and I all come from a very narrow corner: the world of mass market shrinkwrap spreadsheet applications written in Redmond, Washington. As such we have higher bars for ease of use and lower bars for bugs. Most of the other methodology gurus make their living doing consulting for in house corporate development, and that’s what they’re talking about. In any case, we should all be able to learn something from each other.

Nothing is as Simple as it Seems

We had a little usability problem in CityDesk.

Here was the problem: you could import files from the web using a menu command (“Import Web Page”). And you could import files from a disk into CityDesk by dragging them with the mouse. But there was no menu command to import files from a disk. So either people didn’t discover that it was possible, or people tried to use the Import Web Page feature to import from disk, which didn’t work right.

I thought that it would be easy to fix with a two page wizard. Roughly speaking, page one of the wizard would ask you “Where do you want to import from?” If you chose “disk,” page two would prompt you for a file. If you chose “web,” page two would prompt you for a URL.

I almost started implementing this, but something stopped me, and instead, I started to write a mini-spec. Here’s the spec, in its entirety:

Page One
Where do you want to import from? (Disk/Web)

Page Two (Disk)
Standard File/Open dialog

Page Two (Web)
URL prompt with mini-web-browser

Suddenly something occurred to me. Can you put the Windows standard file open dialog, which is usually supplied in toto by the OS, into a wizard?

Hmm.

I investigated. Yes, you can, but it’s no fun and takes a few hours of work. How could I make this not be a wizard? I rewrote the spec:

Two Menu Items:
1) Import Web Page From Internet -> Pops Up URL Dialog
2) Import Web Page From Disk -> Pops Up File Open Dialog

Much better. Three minutes of design work saved me hours of coding.

If you’ve spent more than 20 minutes of your life writing code, you’ve probably discovered a good rule of thumb by now: nothing is as simple as it seems.

Something as simple as copying a file is full of perils. What if the first argument is a directory? What if the second argument is a file? What if a file with the same name already exists in the destination? What if you don’t have write permission?

What if the copy fails in the middle? What if the destination is on a remote machine which is available, but which requires authentication to continue? What if the files are large and the link is slow and you need to show a progress indicator? What if the transfer speed slows down to almost zero… when do you give up and return an error message?

A good way to interview candidates for testing jobs is to give them a simple operation and ask them to enumerate all the things that can possibly go wrong. A classic Microsoft test interview question: how do you test the File Open dialog box? A good tester will be able to rattle off several dozen weird things to test (“file is deleted by another user between the time it is listed in the box and the time you select it and click Open“).

OK, so we have one axiom: nothing is as simple as it seems.

There’s another axiom in software engineering, which is, always try to reduce risk. One particularly important piece of risk to avoid is schedule risk, when something takes longer than expected. Schedule risk is bad because your boss yells at you, which makes you unhappy. If that’s not enough motivation for you, the economic reason why schedule risk is bad is because you decided to do a feature based on information that it would take 1 week. Now that you realize that it has taken 20 weeks to accomplish, that decision might well have been wrong. Perhaps if you knew it was going to take 20 weeks, you would have made a different decision. The more wrong decisions you make, the more likely all those tote bags with your company logo will end up in the liquidator’s warehouse while your ex-CEO mopes that “what sucks is, we weren’t even successful enough to get mentioned on f***edcompany.com when we shut down!”

The combination of nothing-is-as-simple-as-it-seems and reduce-risk can only lead you to one conclusion:

You have to design things before you implement them.

I’m sorry to disappoint you. Yeah, I know, you read Kent Beck and now you think it’s OK to not design things before you implement them. Sorry, it’s not OK. You can not change things in code “just as easily” as you could change them in the design documents. People say this all the time and it’s wrong. “We use high-level tools these days, like Java and XML. We can change things in minutes in the code. Why not design it in code?” My friend, you can put wheels on your mama but that doesn’t make her a bus, and if you think you can refactor your wrongly-implemented file-copy function to make it preemptive rather than threaded as quickly as I could write that sentence, you’re in deep denial.

Anyway, I don’t think Extreme Programming really advocates zero design. They just say “don’t do any more design than needed,” which is fine. But that’s not what people hear. Most programmers are looking for any excuse they can find not to do basic design before implementing features. So they latch onto the “no-design” idea like flies in a bug zapper. Dzzzzzzt! It’s one of those weird forms of laziness where you end up doing more work than you would have done otherwise. I’m too lazy to design the feature on paper first, so I just write some code, and then it’s not right, so I have to fix it, and I spend more time than I would have otherwise. Or, more commonly, I write some code, and then it’s not right, but it’s too late, and my product is inferior and I spend until the end of time making up excuses for why it “has to be that way.” It’s just sloppy and unprofessional.

When Linus Torvalds bashes design, he’s talking about huge systems, which have to evolve, or they become Multics. He’s not talking about your File Copy code. And when you consider that he had a pretty clear road map of exactly where he was going, it’s no wonder Linus doesn’t see much value in design. Don’t fall for it. Chances are it doesn’t apply to you. And anyway, Linus is much smarter than we are, so things that work for him don’t work for us normal people.

Incremental design and implementation is good. Frequent releases are fine (although for shrink-wrapped or mass market software, it drives customers crazy, never a good idea — instead do frequent internal milestones.) Too much formality in design is a waste of time — I’ve never seen a project benefit from mindless flowcharting or UMLing or CRCing or whatever the flavor-du-jour is. And those huge 10 million lines-of-code behemoth systems Linus is talking about should evolve, because humans don’t really know how to design software on that scale.

But when you sit down to write File Copy, or when you sit down to plan the features of the next release of your software, you gotta design. Don’t let the sirens persuade you otherwise.

Come argue with me about this in person at the Cutter Summit, April 29-May 1st, in Cambridge, MA, where I’ll be on a panel with Tom DeMarco, Kent Beck, and others.

The Iceberg Secret, Revealed

“I don’t know what’s wrong with my development team,” the CEO thinks to himself. “Things were going so well when we started this project. For the first couple of weeks, the team cranked like crazy and got a great prototype working. But since then, things seem to have slowed to a crawl. They’re just not working hard any more.” He chooses a Callaway Titanium Driver and sends the caddy to fetch an ice-cold lemonade. “Maybe if I fire a couple of laggards that’ll light a fire under them!”

Meanwhile, of course, the development team has no idea that anything’s wrong. In fact, nothing is wrong. They’re right on schedule.

Don’t let this happen to you! I’m going to let you in on a little secret about those non-technical management types that will make your life a million times easier. It’s real simple. Once you know my secret, you’ll never have trouble working with non-technical managers again (unless you get into an argument over the coefficient of restitution of their golf clubs).

It’s pretty clear that programmers think in one language, and MBAs think in another. I’ve been thinking about the problem of communication in software management for a while, because it’s pretty clear to me that the power and rewards accrue to those rare individuals who know how to translate between Programmerese and MBAese.

[Image]

Since I started working in the software industry, almost all the software I’ve worked on has been what might be called “speculative” software. That is, the software is not being built for a particular customer — it’s being built in hopes that zillions of people will buy it. But many software developers don’t have that luxury. They may be consultants developing a project for a single client, or they may be in-house programmers working on a complicated corporate whatsit for Accounting (or whatever it is you in-house programmers do; it’s rather mysterious to me).

Have you ever noticed that on these custom projects, the single most common cause of overruns, failures, and general miserableness always boils down to, basically, “the (insert expletive here) customer didn’t know what they wanted?”

Here are three versions of the same pathology:

  1. “The damn customer kept changing his mind. First he wanted Client/Server. Then he read about XML in Delta Airlines Inflight Magazine and decided he had to have XML. Now we’re rewriting the thing to use fleets of small Lego Mindstorms Robots.”
  2. “We built it exactly the way they wanted. The contract specified the whole thing down to the smallest detail. We delivered exactly what the contract said. But when we delivered it, they were crestfallen.”
  3. “Our miserable sales person agreed to a fixed price contract to build what was basically unspecified, and the customer’s lawyers were sharp enough to get a clause in the contract that they don’t have to pay us until ‘acceptance by customer,’ so we had to put a team of nine developers on their project for two years and only got paid $800.”

If there’s one thing every junior consultant needs to have injected into their head with a heavy duty 2500 RPM DeWalt Drill, it’s this: Customers Don’t Know What They Want. Stop Expecting Customers to Know What They Want. It’s just never going to happen. Get over it.

Instead, assume that you’re going to have to build something anyway, and the customer is going to have to like it, but they’re going to be a little bit surprised. YOU have to do the research. YOU have to figure out a design that solves the problem that the customer has in a pleasing way.

Put yourself in their shoes. Imagine that you’ve just made $100,000,000 selling your company to Yahoo!, and you’ve decided that it’s about time to renovate your kitchen. So you hire an expert architect with instructions to make it “as cool as Will and Grace’s Kitchen.” You have no idea how to accomplish this. You don’t know that you want a Viking stove and a Subzero refrigerator — these are not words in your vocabulary. You want the architect to do something good, that’s why you hired him.

The Extreme Programming folks say that the solution to this is to get the customer in the room and involve them in the design process every step of the way, as a member of the development team. This is, I think, a bit too “extreme.” It’s as if my architect made me show up while they were designing the kitchen and asked me to provide input on every little detail. It’s boring for me, if I wanted to be an architect I would have become an architect.

Anyway, you don’t really want a customer on your team, do you? The customer-nominee is just as likely to wind up being some poor dweeb from Accounts Payable who got sent to work with the programmers because he was the slowest worker over there and they would barely notice his absence. And you’re just going to spend all your design time explaining things in words of one syllable.

Assume that your customers don’t know what they want. Design it yourself, based on your understanding of the domain. If you need to spend some time learning about the domain or if you need a domain expert to help you, that’s fine, but the design of the software is your job. If you do your domain homework and create a good UI, the customer will be pleased.

Now, I promised to tell you a secret about translating between the language of the customers (or nontechnical managers) of your software and the language of programmers.

You know how an iceberg is 90% underwater? Well, most software is like that too — there’s a pretty user interface that takes about 10% of the work, and then 90% of the programming work is under the covers. And if you take into account the fact that about half of your time is spent fixing bugs, the UI only takes 5% of the work. And if you limit yourself to the visual part of the UI, the pixels, what you would see in PowerPoint, now we’re talking less than 1%.

That’s not the secret. The secret is that People Who Aren’t Programmers Do Not Understand This.

There are some very, very important corollaries to the Iceberg Secret.

Important Corollary One. If you show a nonprogrammer a screen which has a user interface that is 90% worse, they will think that the program is 90% worse.

I learned this lesson as a consultant, when I did a demo of a major web-based project for a client’s executive team. The project was almost 100% code complete. We were still waiting for the graphic designer to choose fonts and colors and draw the cool 3-D tabs. In the meantime, we just used plain fonts and black and white, there was a bunch of ugly wasted space on the screen, basically it didn’t look very good at all. But 100% of the functionality was there and was doing some pretty amazing stuff.

What happened during the demo? The clients spent the entire meeting griping about the graphical appearance of the screen. They weren’t even talking about the UI. Just the graphical appearance. “It just doesn’t look slick,” complained their project manager. That’s all they could think about. We couldn’t get them to think about the actual functionality. Obviously fixing the graphic design took about one day. It was almost as if they thought they had hired painters.

Important Corollary Two. If you show a nonprogrammer a screen which has a user interface which is 100% beautiful, they will think the program is almost done.

People who aren’t programmers are just looking at the screen and seeing some pixels. And if the pixels look like they make up a program which does something, they think “oh, gosh, how much harder could it be to make it actually work?

The big risk here is that if you mock up the UI first, presumably so you can get some conversations going with the customer, then everybody’s going to think you’re almost done. And then when you spend the next year working “under the covers,” so to speak, nobody will really see what you’re doing and they’ll think it’s nothing.

Important Corollary Three. The dotcom that has the cool, polished looking web site and about four web pages will get a higher valuation than the highly functional dotcom with 3700 years of archives and a default grey background.

Oh, wait, dotcoms aren’t worth anything any more. Never mind.

Important Corollary Four. When politics demands that various nontechnical managers or customers “sign off” on a project, give them several versions of the graphic design to choose from.

Vary the placement of some things, change the look and feel and fonts, move the logo and make it bigger or smaller. Let them feel important by giving them non-crucial lipstick-on-a-chicken stuff to muck around with. They can’t do much damage to your schedule here. A good interior decorator is constantly bringing their client swatches and samples and stuff to choose from. But they would never discuss dishwasher placement with the client. It goes next to the sink, no matter what the client wants. There’s no sense wasting time arguing about where the dishwasher goes, it has to go next to the sink, don’t even bring it up; let the clients get their design kicks doing some harmless thing like changing their mind 200 times about whether to use Italian Granite or Mexican Tiles or Norwegian wood butcher-block for the countertops.

Important Corollary Five. When you’re showing off, the only thing that matters is the screen shot. Make it 100% beautiful.

Don’t, for a minute, think that you can get away with asking anybody to imagine how cool this would be. Don’t think that they’re looking at the functionality. They’re not. They want to see pretty pixels.

Steve Jobs understands this. Oh boy does he understand this. Engineers at Apple have learned to do things that make for great screen shots, like the gorgeous new 1024×1024 icons in the dock, even if they waste valuable real estate. And the Linux desktop crowd goes crazy about semitransparent xterms, which make for good screenshots but are usually annoying to use. Every time Gnome or KDE announces a new release I go straight to the screenshots and say, “oh, they changed the planet from Jupiter to Saturn. Cool.” Never mind what they really did.

Remember the CEO at the beginning of this article? He was unhappy because his team had showed him great PowerPoints at the beginning — mockups, created in Photoshop, not even VB. And now that they’re actually getting stuff done under the covers, it looks like they’re not doing anything.

What can you do about this? Once you understand the Iceberg Secret, it’s easy to work with it. Understand that any demos you do in a darkened room with a projector are going to be all about pixels. If you can, build your UI in such a way that unfinished parts look unfinished. For example, use scrawls for the icons on the toolbar until the functionality is there. As you’re building your web service, you may want to consider actually leaving out features from the home page until those features are built. That way people can watch the home page go from 3 commands to 20 commands as more things get built.

More importantly, make sure you control what people think about the schedule. Provide a detailed schedule in Excel format. Every week, send out self-congratulatory email talking about how you’ve moved from 32% complete to 35% complete and are on track to ship on December 25th. Make sure that the actual facts dominate any thinking about whether the project is moving forward at the right speed. And don’t let your boss use Callaway Titanium Drivers, I don’t care how much you want him to win, the USGA has banned them and it’s just not fair.

Discuss

Rub a dub dub

One reason people are tempted to rewrite their entire code base from scratch is that the original code base wasn’t designed for what it’s doing. It was designed as a prototype, an experiment, a learning exercise, a way to go from zero to IPO in nine months, or a one-off demo. And now it has grown into a big mess that’s fragile and impossible to add code to, and everybody’s whiny, and the old programmers quit in despair and the new ones that are brought in can’t make head or tail of the code so they somehow convince management to give up and start over while Microsoft takes over their business. Today let me tell you a story about what they could have done instead.

FogBUGZ started out six years ago as an experiment to teach myself ASP programming. Pretty soon it became an in-house bug tracking system. It got embellished almost daily with features that people needed until it was good enough that it no longer justified any more work.

Various friends asked me if they could use FogBUGZ at their companies. The trouble was, it had too many hardcoded things that made it a pain to run anywhere other than on the original computer where it was deployed. I had used a bunch of SQL Server stored procedures, which meant that you needed SQL Server to run FogBUGZ, which was expensive and overkill for some of the two person teams that wanted to run it. And so on. So I would tell my friends, “gosh, for $5000 in consulting fees, I’ll spend a couple of days and clean up the code so you can run it on your server using Access instead of SQL Server.” Generally, my friends thought this was too expensive.

After this happened a few times I had a revelation — if I could sell the same program to, say, three people, I could charge $2000 and come out ahead. Or thirty people for $200. Software is neat like that. So in late 2000, Michael sat down, ported the code so that it worked on Access or SQL Server, pulled all the site-specific stuff out into a header file, and we started selling the thing. I didn’t really expect much to come of it.

In those days, I thought, golly, there are zillions of bug tracking packages out there. Every programmer has written a dinky bug tracking package. Why would anyone buy ours? I knew one thing: programmers who start businesses often have the bad habit of thinking everybody else is a programmer just like them and wants the same stuff as them, and so they have an unhealthy tendency to start businesses that sell programming tools. That’s why you see so many scrawny companies hawking source-code-generating geegaws, error catching and emailing geegaws, debugging geegaws, syntax-coloring editing tchotchkes, ftping baubles, and, ahem, bug tracking packages. All kinds of stuff that only a programmer could love. I had no intention of falling into that trap!

Of course, nothing ever works out exactly as planned. FogBUGZ was popular. Really popular. It accounts for a significant chunk of Fog Creek’s revenue and sales are growing steadily. The People won’t stop buying it.

So we did version 2.0. This was an attempt to add some of the most obviously needed features. While David worked on version 2.0 we honestly didn’t think it was worth that much effort, so he tended to do things in what you might call an “expedient” fashion rather than, say, an “elegant” fashion. Certain, ahem, design issues in the original code were allowed to fester. There were two complete sets of nearly identical code for drawing the main bug-editing page. SQL statements were scattered throughout the HTML hither and yon, to and fro, pho and ton. Our HTML was creaky and designed for those ancient browsers that were so buggy they could crash loading about:blank.

Yeah, it worked brilliantly, we’ve been at zero known bugs for a while now. But inside, the code was, to use the technical term, a “big mess.” Adding new features was a hemorrhoid. To add one field to the central bug table would probably require 50 modifications, and you’d still be finding places you forgot to modify long after you bought your first family carplane for those weekend trips to your beach house on Mars.

A lesser company, perhaps one run by an executive from the express-package-delivery business, might have decided to scrap the code and start over.

Did I mention that I don’t believe in starting from scratch? I guess I talk about that a lot.

Anyway, instead of starting from scratch, I decided it was worth three weeks of my life to completely scrub the code. Rub a dub dub. In the spirit of refactoring, I set out a few rules for this exercise.

  1. No New Features, not even small ones, would be added.
  2. At any time, with any check in, the code would still work perfectly.
  3. All I would do is logical transformations — the kinds of things that are almost mechanical and which you can convince yourself immediately will not change the behavior of the code.

I went through each source file, one at a time, top to bottom, looking at code, thinking about how it could be better structured, and making simple changes. Here are some of the kinds of things I did during these three weeks:

  • Changed all HTML to xhtml. For example, <BR> became <br />, all attributes were quoted, all nested tags were matched up, and all pages were validated.
  • Removed all formatting (<FONT> tags etc.) and put everything in a CSS style sheet.
  • Removed all SQL statements from the presentation code and indeed all program logic (what the marketing types like to call “business rules”). This stuff went into classes which were not really designed — I simply added methods lazily as I discovered a need for them. (Somewhere, someone with a big stack of 4×6 cards is sharpening their pencil to poke my eyes out. What do you mean you didn’t design your classes?)
  • Found repeated blocks of code and created classes, functions, or methods to eliminate the repetition. Broke up large functions into multiple smaller ones.
  • Removed all remaining English language text from the main code and isolated that in a single file for easy internationalization.
  • Restructured the ASP site so there is a single entry point instead of lots of different files. This makes it very easy to do things that used to be hard, for example, now we can display input error messages in the very form where the invalid input occurred, something which should be easy if you lay things out right, but I hadn’t laid things out right when I was learning ASP programming way back when.

Over three weeks the code got better and better internally. To the end user, not too much changed. Some fonts are a little nicer thanks to CSS. I could have stopped at any point, because at any given time I had 100% working code (and I uploaded every checkin to our live internal FogBUGZ server to make sure). And in fact I never really had to think very hard, and I never had to design a thing, because all I was doing was simple, logical transformations. Occassionally I would encounter a weird nugget in the code. These nuggets were usually bug fixes that had been implemented over the years. Luckily I could keep the bug fix intact. In many of these cases, I realized that had I started from scratch, I would have made the same bug all over again, and may not have noticed it for months or years.

I’m basically done now. It took, as planned, three weeks. Almost every line of code is different now. Yep, I looked at every line of code, and changed most of them. The structure is completely different. All the bug-tracking functionality is completely separate from the HTML UI functionality.

Here are all the good things about my code scrubbing activity:

  • It took vastly less time than a complete rewrite. Let’s assume (based on how long it took us to get this far with FogBUGZ) that a complete rewrite would have taken a year. Well, that means I saved 49 weeks of work. Those 49 weeks represent knowledge in the design of the code that I preserved intact. I never had to think, “oh, I need a new line here.” I just had to change <BR> to <br /> mindlessly and move on. I didn’t have to figure out how to get multipart working for file uploads. It works. Just tidy it up a bit.
  • I didn’t introduce any new bugs. A couple of tiny ones, of course, probably got through. But I was never doing the types of things that cause bugs.
  • I could have stopped and shipped at any time if necessary.
  • The schedule was entirely predictable. After a week of this, you can calculate exactly how many lines of code you clean in an hour, and get a darn good estimate for the rest of the project. Try that, Mozilla river-drawing people.
  • The code is now much more amenable to new features. We’ll probably earn back the three weeks with the first new major feature we implement.

Much of the literature on refactoring is attributable to Martin Fowler, although of course the principles of code cleanups have been well known to programmers for years. An interesting new area is refactoring tools, which is just a fancy word for programs that do some of this stuff automatically. We’re a long way from having all the good tools we need — in most programming environments you can’t even do a simple transformation like changing the name of a variable (and having all the references to it change automatically). But it’s getting better, and if you want to start one of those scrawny companies hawking programming tool geegaws, or make a useful contribution to open source, the field is wide open.

Fire And Motion

Sometimes I just can’t get anything done.

Sure, I come into the office, putter around, check my email every ten seconds, read the web, even do a few brainless tasks like paying the American Express bill. But getting back into the flow of writing code just doesn’t happen.

TetrisThese bouts of unproductiveness usually last for a day or two. But there have been times in my career as a developer when I went for weeks at a time without being able to get anything done. As they say, I’m not in flow. I’m not in the zone. I’m not anywhere.

Everybody has mood swings; for some people they are mild, for others, they can be more pronounced or even dysfunctional. And the unproductive periods do seem to correlate somewhat with gloomier moods.

It makes me think of those researchers who say that basically people can’t control what they eat, so any attempt to diet is bound to be short term and they will always yoyo back to their natural weight. Maybe as a software developer I really can’t control when I’m productive, and I just have to take the slow times with the fast times and hope that they average out to enough lines of code to make me employable.

 

 
Go read The Onion for a while.
 
 

What drives me crazy is that ever since my first job I’ve realized that as a developer, I usually average about two or three hours a day of productive coding. When I had a summer internship at Microsoft, a fellow intern told me he was actually only going into work from 12 to 5 every day. Five hours, minus lunch, and his team loved him because he still managed to get a lot more done than average. I’ve found the same thing to be true. I feel a little bit guilty when I see how hard everybody else seems to be working, and I get about two or three quality hours in a day, and still I’ve always been one of the most productive members of the team. That’s probably why when Peopleware and XP insist on eliminating overtime and working strictly 40 hour weeks, they do so secure in the knowledge that this won’t reduce a team’s output.

But it’s not the days when I “only” get two hours of work done that worry me. It’s the days when I can’t do anything.

I’ve thought about this a lot. I tried to remember the time when I got the most work done in my career. It was probably when Microsoft moved me into a beautiful, plush new office with large picture windows overlooking a pretty stone courtyard full of cherry trees in bloom. Everything was clicking. For months I worked nonstop grinding out the detailed specification for Excel Basic — a monumental ream of paper going into incredible detail covering a gigantic object model and programming environment. I literally never stopped. When I had to go to Boston for MacWorld I took a laptop with me, and documented the Window class sitting on a pleasant terrace at HBS.

Once you get into flow it’s not too hard to keep going. Many of my days go like this: (1) get into work (2) check email, read the web, etc. (3) decide that I might as well have lunch before getting to work (4) get back from lunch (5) check email, read the web, etc. (6) finally decide that I’ve got to get started (7) check email, read the web, etc. (8) decide again that I really have to get started (9) launch the damn editor and (10) write code nonstop until I don’t realize that it’s already 7:30 pm.

Somewhere between step 8 and step 9 there seems to be a bug, because I can’t always make it across that chasm.bike trip For me, just getting started is the only hard thing. An object at rest tends to remain at rest. There’s something incredible heavy in my brain that is extremely hard to get up to speed, but once it’s rolling at full speed, it takes no effort to keep it going. Like a bicycle decked out for a cross-country, self-supported bike trip — when you first start riding a bike with all that gear, it’s hard to believe how much work it takes to get rolling, but once you are rolling, it feels just as easy as riding a bike without any gear.

Maybe this is the key to productivity: just getting started. Maybe when pair programming works it works because when you schedule a pair programming session with your buddy, you force each other to get started.

Joel in the Army

When I was an Israeli paratrooper a general stopped by to give us a little speech about strategy. In infantry battles, he told us, there is only one strategy: Fire and Motion. You move towards the enemy while firing your weapon. The firing forces him to keep his head down so he can’t fire at you. (That’s what the soldiers mean when they shout “cover me.” It means, “fire at our enemy so he has to duck and can’t fire at me while I run across this street, here.” It works.)  The motion allows you to conquer territory and get closer to your enemy, where your shots are much more likely to hit their target. If you’re not moving, the enemy gets to decide what happens, which is not a good thing. If you’re not firing, the enemy will fire at you, pinning you down.

I remembered this for a long time. I noticed how almost every kind of military strategy, from air force dogfights to large scale naval maneuvers, is based on the idea of Fire and Motion. It took me another fifteen years to realize that the principle of Fire and Motion is how you get things done in life. You have to move forward a little bit, every day. It doesn’t matter if your code is lame and buggy and nobody wants it. If you are moving forward, writing code and fixing bugs constantly, time is on your side. Watch out when your competition fires at you. Do they just want to force you to keep busy reacting to their volleys, so you can’t move forward?

Think of the history of data access strategies to come out of Microsoft. ODBC, RDO, DAO, ADO, OLEDB, now ADO.NET – All New! Are these technological imperatives? The result of an incompetent design group that needs to reinvent data access every goddamn year? (That’s probably it, actually.) But the end result is just cover fire. The competition has no choice but to spend all their time porting and keeping up, time that they can’t spend writing new features. Look closely at the software landscape. The companies that do well are the ones who rely least on big companies and don’t have to spend all their cycles catching up and reimplementing and fixing bugs that crop up only on Windows XP. The companies who stumble are the ones who spend too much time reading tea leaves to figure out the future direction of Microsoft. People get worried about .NET and decide to rewrite their whole architecture for .NET because they think they have to. Microsoft is shooting at you, and it’s just cover fire so that they can move forward and you can’t, because this is how the game is played, Bubby. Are you going to support Hailstorm? SOAP? RDF? Are you supporting it because your customers need it, or because someone is firing at you and you feel like you have to respond? The sales teams of the big companies understand cover fire. They go into their customers and say, “OK, you don’t have to buy from us. Buy from the best vendor. But make sure that you get a product that supports (XML / SOAP / CDE / J2EE) because otherwise you’ll be Locked In The Trunk.” Then when the little companies try to sell into that account, all they hear is obedient CTOs parrotting “Do you have J2EE?” And they have to waste all their time building in J2EE even if it doesn’t really make any sales, and gives them no opportunity to distinguish themselves. It’s a checkbox feature — you do it because you need the checkbox saying you have it, but nobody will use it or needs it. And it’s cover fire.

Fire and Motion, for small companies like mine, means two things. You have to have time on your side, and you have to move forward every day. Sooner or later you will win. All I managed to do yesterday is improve the color scheme in FogBUGZ just a little bit. That’s OK. It’s getting better all the time. Every day our software is better and better and we have more and more customers and that’s all that matters. Until we’re a company the size of Oracle, we don’t have to think about grand strategies. We just have to come in every morning and somehow, launch the editor.

It's getting better all the time... o/~
Discuss

Getting Things Done When You’re Only a Grunt

This site is supposed to be about software management. But sometimes you don’t have the power to create change in your organization by executive fiat. Obviously, if you’re just a grunt programmer at the bottom of the totem pole, you can’t exactly order people to start creating schedules or bug databases. And in fact even if you’re a manager, you’ve probably discovered that managing developers is a lot like herding cats, only not as fun. Merely saying “make it so” doesn’t make it so.

It can be frustrating when you’re working in an organization that scores low on The Joel Test. No matter how good your code is, your coworkers write such bad code that you’re embarassed to be associated with the project. Or management is making bad decisions about what code to write, so you’re forced to waste your talent debugging the AS/400 version of a retirement-planning game for kids.

You could just leave, I suppose. But presumably, there’s some reason you’re stuck there. The stock options haven’t quite vested, there’s no better place to work in Podunk, or perhaps your boss is holding someone you love hostage. In any case, dealing with life on a bad team can be infuriating. But there are strategies for improving your team from the bottom, and I’d like to share a few of them.

Strategy 1 Just Do It

A lot can be done to improve the project just by one person doing it. Don’t have a daily build server? Make one. Set your own machine up with a scheduled job to make builds at night and send out email results. Does it take too many steps to make the build? Write the makefile. Nobody does usability tests? Do your own hallway usability tests on the mailroom folks with a piece of paper or a VB prototype.

Strategy 2 Harness the Power of Viral Marketing

Many of the Joel Test strategies can be implemented by a single person on an uncooperative team. Some of them, if done well, will spread to the rest of the team.

For example, suppose nobody on your team can be persuaded to use a bug database. Don’t let it bother you. Just keep your own. Enter bugs that you find in your own code. If you find a bug that somebody else really should fix, assign the bug to them using the bug database. If you have good bug tracking software, this will send them an email. But now, you can keep sending them emails if they don’t fix the bug. Eventually, they’ll see the value of bug tracking and start to use the system as it was intended. If the QA team refuses to input bugs to the bug tracking system, simply refuse to listen to bug reports through any other channel. About the three-thousdandth time that you say to people, “listen, I’d love to fix that, but I’m going to forget. Can you enter a bug in the system?” they’ll start using the database.

Nobody on your team wants to use source control? Create your own CVS repository, on your own hard drive if necessary. Even without cooperation, you can check your code in independently from everybody else’s. Then when they have problems that source control can solve (someone accidentally types rm * ~ instead of rm *~), they’ll come to you for help. Eventually, people will realize that they can have their own checkouts, too.

Strategy 3 Create a Pocket of Excellence

The team won’t make schedules? Or specs? Write your own. Nobody’s going to complain if you take a day or two to write a minimal spec and schedule for the work you’re about to do.

Get better people into the team. Get involved in hiring and interviewing, and recruit good candidates to join the team.

Find the people who are willing to improve and capable of it, and get them on your side. Even on poor teams, you’re likely to have some smart people who just don’t have the experience to create great code. Help them out. Set them up to learn. Read their code checkins. If they do something stupid, don’t send them a snooty email explaining what’s stupid about their checkins. That will just make them angry and defensive. Instead, innocently report the bug that you know is the result of the checkin. Let them figure out what’s causing it. When they find the bug for themselves, they’ll remember that lesson a lot better.

Strategy 4 Neutralize The Bozos

Even the best teams can have a bozo or two. The frustrating part about having bad programmers on your team is when their bad code breaks your good code, or good programmers have to spend time cleaning up after the bad programmers.

As a grunt, your goal is damage-minimization, a.k.a. containment. At some point, one of these geniuses will spend two weeks writing a bit of code that is so unbelievably bad that it can never work. You’re tempted to spend the fifteen minutes that it takes to rewrite the thing correctly from scratch. Resist the temptation. You’ve got a perfect opportunity to neutralize this moron for several months. Just keep reporting bugs against their code. They will have no choice but to keep slogging away at it for months until you can’t find any more bugs. Those are months in which they can’t do any damage anywhere else.

Strategy 5 Get Away From Interruptions

All happy work environments are alike (private offices, quiet working conditions, excellent tools, few interruptions and even fewer large meetings). All unhappy work environments are unhappy in their own way.

The bad news is that changing the working environment is almost impossible in virtually any company. Long term leases may mean that even the CEO can’t do anything about it. That’s why so few software developers get private offices. This hurts their companies in at least two ways. First, it makes it harder to recruit top notch developers, who will prefer the firm that gives them cushier conditions (all else being equal). Second, the level of interruptions can dramatically reduce the productivity of developers, who find it impossible to get into the zone and stay in it for any length of time.

Look for ways to get out of this environment. Take a laptop to the company cafeteria, where there are lots of tables that are empty most of the day (and nobody can find you). Book a conference room for the whole day and write code there, and make it clear through the preponderance of checkins just how much more work you get done when you’re in a room by yourself. The next time there’s a crunch on and your manager asks you what you need to Get This Done By Tomorrow, you know what to say. They’ll find you an office for the day. And pretty soon they’ll start wondering what they can do to keep that productive thing going year round.

Come into work late and leave late. Those hours after the rest of the company goes home can be the most productive. Or, if you’re on a team of developers who regularly come in late, get into work at 9 am. You’ll do more work in the two hours before other people come in and start bothering you than you do in the rest of the day.

Don’t keep your email or IM client running. Check your email every hour, if you want, but don’t keep it running.

Strategy 6 Become Invaluable

None of these strategies work if you’re not really an excellent contributor. If you don’t write good code, and lots of it, you’re just going to be resented for messing around with bug databases when you “should be” writing code. There’s nothing more deadly to your career than having a reputation of being so concerned with process that you don’t accomplish anything.

Once, when I started a new job as a grunt programmer at a new company, I discovered that the company was running somewhere around 2 on the Joel Test, and I was determined to fix it. But I also knew that making a good first impression was crucial. So I allocated the first seven hours of every day to just writing code, as was expected of me. There’s nothing like a flurry of checkins to make you look good to the rest of the development team. But I reserved another hour every afternoon before going home to improving the process. I used that time to fix things that made it hard to debug our product. I set up a daily build and a bug database. I fixed all the longstanding annoyances that made development difficult. I wrote specs for the work that I was doing during the day. I wrote a document explaining step-by-step how to create a development machine from scratch. I thoroughly documented an important internal language which had been undocumented. Slowly, the process got better and better. Nobody but me (and my team, when I was put in charge of one) ever did schedules or specs, but other than that we hit about 10 on the Joel Test.

You can make things better, even when you’re not in charge, but you have to be Caesar’s Wife: above suspicion. Otherwise you’ll make enemies as you go along.

Any other ideas?

Humane Programming

Lazy programmers who didn’t read about how users don’t read anything:

Click for Larger View

  1. Most decent web programmers can always figure out how to design an interaction that won’t fail if you hit back or reload. These techniques are even older than cookies.
  2. Having three paragraphs of text telling you not to hit back or reload isn’t going to help, because people won’t read it.
  3. The programmer who wrote this screen admits that 10% of users “forget this warning.” It’s not because they forgot. It’s because humans are fallable and we’re used to hitting refresh when the screen is messed up, and we’re used to hitting back when we realize we went forward too soon. Don’t argue with me, Yahoo, when you’re the ones quoting statistics to prove it!

The main purpose of this screen seems to be so that users blame themselves when they hit reload and find themselves blown back to step one. Oh darn! I’m so stupid! say the users. Yes, that’s what happens. Watch any usability test where the product is failing – the users inevitably blame “their own stupidity.” Better that 100,000 users should feel stupid than one programmer admit he didn’t do a very good job.

Don’t let anyone tell you that as a programmer you don’t have to make moral or ethical decisions. Every time you decide that making users feel stupid is better than fixing your code, you’re making an ethical decision.