Explaining the Excel Bug

By now you’ve probably seen a lot of the brouhaha over a bug in the newest version of Excel, 2007. Basically, multiplying 77.1*850, which should give you 65,535, was actually displaying 100,000.

Before I try to explain this, I should disclose that I did work on the Excel team, but that was thirteen years ago. I haven’t been there for a long time. I don’t even think I know anyone on that team any more. I’m just trying to explain the bug a little bit as a public service.

The first thing you have to understand is that Excel keeps numbers, internally, in a binary format, but displays them as strings. For example, when you type 77.1, Excel stores this internally using 64 bits:

0100 0000 0101 0011 0100 0110 0110 0110
0110 0110 0110 0110 0110 0110 0110 0110

The display is showing you four characters: “7”, “7”, “.”, and “1”.

Somewhere inside Excel is a function that converts binary numbers to strings for displaying. This is the code that has the bug that causes a few numbers which are extremely close to 65,535 to be formatted incorrectly as 100,000.

If you use the number further along in calculations, for example, if you add 2 to the results, you’ll get the right thing.

=77.1*850 -> displays 100000

=77.1*850+2 -> displays 65537, correctly.

Just to throw people off, this bug also exists for a few numbers which are extremely close to 65,536. They display incorrectly as 100,001.

=77.1*850+1 -> displays 100,001, incorrectly.

This is still only a bug in the number formatting code; if you try to make a chart with that number in it, you’ll get a correct chart.

Now… you may have noticed that I said that this bug exists for numbers which are extremely close to 65,535, but not for 65,535 itself. Indeed if you enter 65,535 you see 65,535. But, you notice, 77.1 * 850 should be exactly 65,535, not extremely close to 65,535!

Look closely at the binary representation for 77.1:

0100 0000 0101 0011 0100 0110 0110 0110
0110 0110 0110 0110 0110 0110 0110 0110

See how there’s a lot of 0110 0110 0110 there at the end? That’s because 0.1 has no exact representation in binary… it’s a repeating binary number. It’s sort of like how 1/3 has no representation in decimal. 1/3 is 0.33333333 and you have to keep writing 3’s forever. If you lose patience, you get something inexact.

So you can imagine how, in decimal, if you tried to do 3*1/3, and you didn’t have time to write 3’s forever, the result you would get would be 0.99999999, not 1, and people would get angry with you for being wrong.

The same thing happens in binary with  numbers ending in 0.1: they are repeating decimals, so when you do mathematical operations on them, very small insignificant errors creep in somewhere way to the right of the decimal point. (PS: same for .2, .3, .4, .6, .7, .8, and .9, but not .5).

The IEEE has a standard, IEEE 754, for how to represent floating point numbers in binary, and this is what almost everybody uses, including Excel, and they have for a really long time, and it means sometimes you get imprecise results when you add a lot of 0.1’s together, but if you’re rounding the numbers to a reasonable number of decimal points, you won’t really care.

Back to the Excel bug, which is a genuine bug, not just an artifact of this IEEE 754 stuff. Since 77.1 has no exact representation, Excel stores it as

0100 0000 0101 0011 0100 0110 0110 0110
0110 0110 0110 0110 0110 0110 0110 0110

and then when you try to multiply it by 850, you get something very close to 65,535, but not exactly 65,535, because of the fact that 77.1 wasn’t stored exactly because that would take infinite memory. And this number, which is very close to 65,535, happens to be one of only 12 possible floating point numbers which trigger this bug in Excel.

OK, Q&A.

Q: Isn’t this really, really bad?

A: IMHO, no, the chance that you would see this in real life calculations is microscopic. Better worry about getting hit by a meterorite. Microsoft, of course, will be forced to tell everyone “accuracy is extremely important to us” and I’m sure they’ll have a fix in a matter of days, and they’ll be subjected to all kinds of well-deserved ridicule, but since I don’t work there I’m free to tell you that the chance of this bug actually mattering to you as an individual is breathtakingly small.

Q: Shouldn’t they be testing for these kinds of things?

A: I’ll bet that most of the numeric testing done on the Excel team is done automatically with VBA code. Cells containing this value display as 100,000, but from VBA, they’re going to look like 65,535 (since the number would be passed into the Basic runtime in binary, before the display formatting.) I’m sure there’s plenty of code to test display formatting, but with a bug like this that only happens on 12 out of 18446744073709551616 possible floating point binary numbers, it’s unlikely that any set of black-box tests would cover this case.

Q: What caused the bug?

A: I’m not sure exactly, since I don’t have the code. Off the top of my head, I can’t think of anything that would cause this behavior. Play around with Quanfei Wen’s IEEE-754 calculator, maybe you’ll find something.

Q: Why not use “exact” (decimal) arithmetic?

A: It’s much slower than floating point arithmetic, since there’s no hardware on your CPU chip to do it for you natively.

Over the years, Microsoft got so much heat for floating point rounding artifacts in the Windows Calculator that they rewrote it to use an arbitrary-precision arithmetic library. Since you have to poke at Windows Calculator with a stick, it doesn’t have to be as fast as Excel. That said, CPUs have gotten pretty fast. I’ll bet an arbitrary-precision version of Excel would perform pretty well these days. Still, the Microsoft Excel support team has spent the last 20 years defending IEEE 754, and it’s not surprising that they’ve started to believe in it.

And let’s face it — do you really want the bright sparks who work there now, and manage to break lots of perfectly good working code — rewriting the core calculating engine in Excel? Better keep them busy adding and removing dancing paper clips all day long.

Unfocused and Unabashed

“I didn’t relish the carpal tunnel syndrome that would result from signing all these forms. We tried to ‘sign’ them by running the forms through the laser printer again to print the signature in the right place. But, as I learned, something traumatic happens to paper on its way through a printer. Once a sheet of paper has been through a laser printer, the next time you try to print on it, it’s going to fight back, jamming the printer and resulting in the deaths of three other pages, and you’re going to spend five minutes with a putty knife cleaning up the bloodshed.”

How Hard Could It Be?: Unfocused and Unabashed

PS: This article will appear in the October issue of Inc. Magazine; it’s the first installment of what will be a monthly column, mostly about business of software startups. The column is called “How Hard Could It Be?” As an entrepreneur I’ve really enjoyed this magazine over the years and I’ve learned a lot, so I’m honored to be a columnist. They will be publishing each column on their website, and I’ll link to it as soon as they do so, but you may also want to subscribe to the print magazine; it’s less than $10 a year and well worth it.

Princeton, Philadelphia, Boston

I’ve been getting a little bit behind on my world tour trip reports, but things have been going so smoothly thanks to Liz’s heroic organization efforts that there’s not much to report!

Princeton was our smallest group so far—just 19 people, but the hotel was really nice. That afternoon we drove to Philadelphia, where we had about 55 people, including a couple of spies in the audience from our Philly-based web design firm who were there to get a sense of the kind of audience they were designing for.

I’ll interrupt this train of thought for a moment to talk a little bit about the gear I’ve been using. As I may have mentioned, my laptop is a Lenovo ThinkPad X61s, extremely small and light but with a comfortable full size keyboard. If you’ve ever thought about getting a ThinkPad but worried about the eraser-head trackpoint in between the G and the H keys that these things use as a pointer, don’t be. It takes a little time to get used to, but it works much much better than the more common touchpad because (a) you don’t have to take your hands away from the home row to use the mouse and (b) you never touch it accidentally while typing, causing the cursor to jump somewhere else.

My phone these days is the Samsung Blackjack, running Windows Mobile 5. Like Windows, it’s extremely frustrating and messy and disorganized. Also like Windows, if you’re willing to hammer away at it, you can make it do some pretty amazing things. In my world, amazing includes the fact that Liz can put things on my schedule and they’ll show up on my phone via over-the-air synchronization. Another thing I’ve come to rely upon on this trip is the high speed internet access… in most of the places I’ve been, AT&T has HSDPA access, which is pretty fast. But the phone stays in my pocket… the laptop connects over bluetooth to the phone which is running Internet Connection Sharing, a little applet that AT&T has tried to hide but which is still in the Windows folder on the phone. I’ve had HSDPA access in Seattle, Chicago, Philadelphia, Boston, New York, Princeton, even in the Hamptons. I just take out the phone, run the internet connection sharing applet and hit “connect,” then click on the appropriate bluetooth icon on the laptop, and, plink!, I’m on the internet at high speed.

That’s the main reason I don’t carry an iPhone — I use this HSDPA access every day and it rocks, and it’s inevitable that the next gen iPhone will have it, so I’ll wait, kthxbye.

OK, back to the trip report. Yesterday afternoon, I stopped by the ITA Software office in Cambridge (for all intents and purposes, the only large software shop that uses Lisp) to say hi and thank them for the useful search technology behind Orbitz, which made it possible to plan this trip even with all the multi-legged trips that brought Expedia to its knees. They were nice enough to take me out to dinner, too. Thanks!

This morning in Boston we had a huge turnout… 200 people who didn’t stop asking questions. For some reason which I can’t figure out, the demo part of my speech is taking a little bit longer every time I do it. I don’t think I’m adding things; I think I’m just explaining more. Who knows. Anyway.

There’s a new branch of Wagamama in Quincy Market. Looked exactly like the last one I ate at, in Sydney. A very nice addition to the otherwise dreadful dining alternatives of Boston’s Festival Marketplace.

The hotel internet in Boston was rather congested and I was having a lot of trouble doing anything online there, which is why yesterday’s Strategy Letter had so many typos. Sorry.

Brent Ashley: “I’ll provide some links here which will help the reader to understand how many of the points Joel makes in his essay are supported by existing technologies in various states of readiness. It’s a big pantry of ingredients that is waiting for the right chef to come along and combine them in a way that inspires the world to follow.”

Indeed countless people have already emailed me to say that “NewSDK is here, it’s (choose one) Flex Builder, Google Web Toolkit, Java Web Start, Silverlight, JavaFX, Flash, ActionScript, MORFIK, OpenLaszlo, … (many omitted)” Ahem. These are not HERE until your TAXI DRIVER has heard of them, because I assure you he’s heard of Microsoft Windows. Many of these technologies are developed by smart people who understand the world the way I talked about in the strategy letter, and are hoping to win the next platform war. But GWT is no more the NewSDK than Digital Research GEM, or IBM TopView, or Quarterdeck DESQView, or Concurrent DOS, or Microsoft Windows 1.0 was the OldSDK. They’re just horses at the starting gate.

I’m in Kitchener, Ontario right now, discovering that an even better predictor of a hotel I don’t really want to stay at is that it advertises that kings, queens, and presidents have stayed there. Sorry, darling, your hotel is charming, but I don’t care what your marketing materials say, if you really gave the Queen Mum these same shabby old towels as you gave me, Canada would be a republic by now.

Thursday morning is the Kitchener demo with 75 attendees; in the afternoon we’ll have an astonishing 240 people in Toronto, and then fly home. Tallyho!

Strategy Letter VI

IBM just released an open-source office suite called IBM Lotus Symphony. Sounds like Yet Another StarOffice distribution. But I suspect they’re probably trying to wipe out the memory of the original Lotus Symphony, which had been hyped as the Second Coming and which fell totally flat. It was the software equivalent of Gigli.

In the late 80s, Lotus was trying very hard to figure out what to do next with their flagship spreadsheet and graphics product, Lotus 1-2-3. There were two obvious ideas: first, they could add more features. Word processing, say. This product was called Symphony. Another idea which seemed obvious was to make a 3-D spreadsheet. That became 1-2-3 version 3.0.

Both ideas ran head-first into a serious problem: the old DOS 640K memory limitation. IBM was starting to ship a few computers with 80286 chips, which could address more memory, but Lotus didn’t think there was a big enough market for software that needed a $10,000 computer to run. So they squeezed and squeezed. They spent 18 months cramming 1-2-3 for DOS into 640K, and eventually, after a lot of wasted time, had to give up the 3D feature to get it to fit. In the case of Symphony, they just chopped features left and right.

Neither strategy was right. By the time 123 3.0 was shipping, everybody had 80386s with 2M or 4M of RAM. And Symphony had an inadequate spreadsheet, an inadequate word processor, and some other inadequate bits.

“That’s nice, old man,” you say. “Who gives a fart about some old character mode software?”

Humor me for a minute, because history is repeating itself, in three different ways, and the smart strategy is to bet on the same results.

Limited-memory, limited-CPU environments

From the beginning of time until about, say, 1989, programmers were extremely concerned with efficiency. There just wasn’t that much memory and there just weren’t that many CPU cycles.

In the late 90s a couple of companies, including Microsoft and Apple, noticed (just a little bit sooner than anyone else) that Moore’s Law meant that they shouldn’t think too hard about performance and memory usage… just build cool stuff, and wait for the hardware to catch up. Microsoft first shipped Excel for Windows when 80386s were too expensive to buy, but they were patient. Within a couple of years, the 80386SX came out, and anybody who could afford a $1500 clone could run Excel.

As a programmer, thanks to plummeting memory prices, and CPU speeds doubling every year, you had a choice. You could spend six months rewriting your inner loops in Assembler, or take six months off to play drums in a rock and roll band, and in either case, your program would run faster. Assembler programmers don’t have groupies.

So, we don’t care about performance or optimization much anymore.

Except in one place: JavaScript running on browsers in AJAX applications. And since that’s the direction almost all software development is moving, that’s a big deal.

A lot of today’s AJAX applications have a meg or more of client side code. This time, it’s not the RAM or CPU cycles that are scarce: it’s the download bandwidth and the compile time. Either way, you really have to squeeze to get complex AJAX apps to perform well.

History, though, is repeating itself. Bandwidth is getting cheaper. People are figuring out how to precompile JavaScript.

The developers who put a lot of effort into optimizing things and making them tight and fast will wake up to discover that effort was, more or less, wasted, or, at the very least, you could say that it “conferred no long term competitive advantage,” if you’re the kind of person who talks like an economist.

The developers who ignored performance and blasted ahead adding cool features to their applications will, in the long run, have better applications.

A portable programming language

The C programming language was invented with the explicit goal of making it easy to port applications from one instruction set to another. And it did a fine job, but wasn’t really 100% portable, so we got Java, which was even more portable than C. Mmmhmm.

Right now the big hole in the portability story is — tada! — client-side JavaScript, and especially the DOM in web browsers. Writing applications that work in all different browsers is a friggin’ nightmare. There is simply no alternative but to test exhaustively on Firefox, IE6, IE7, Safari, and Opera, and guess what? I don’t have time to test on Opera. Sucks to be Opera. Startup web browsers don’t stand a chance.

What’s going to happen? Well, you can try begging Microsoft and Firefox to be more compatible. Good luck with that. You can follow the p-code/Java model and build a little sandbox on top of the underlying system. But sandboxes are penalty boxes; they’re slow and they suck, which is why Java Applets are dead, dead, dead. To build a sandbox you pretty much doom yourself to running at 1/10th the speed of the underlying platform, and you doom yourself to never supporting any of the cool features that show up on one of the platforms but not the others. (I’m still waiting for someone to show me a Java applet for phones that can access any of the phone’s features, like the camera, the contacts list, the SMS messages, or the GPS receiver.)

Sandboxes didn’t work then and they’re not working now.

What’s going to happen? The winners are going to do what worked at Bell Labs in 1978: build a programming language, like C, that’s portable and efficient. It should compile down to “native” code (native code being JavaScript and DOMs) with different backends for different target platforms, where the compiler writers obsess about performance so you don’t have to. It’ll have all the same performance as native JavaScript with full access to the DOM in a consistent fashion, and it’ll compile down to IE native and Firefox native portably and automatically. And, yes, it’ll go into your CSS and muck around with it in some frightening but provably-correct way so you never have to think about CSS incompatibilities ever again. Ever. Oh joyous day that will be.

High interactivity and UI standards

The IBM 360 mainframe computer system used a user interface called CICS, which you can still see at the airport if you lean over the checkin counter. There’s an 80 character by 24 character green screen, character mode only, of course. The mainframe sends down a form to the “client” (the client being a 3270 smart terminal). The terminal is smart; it knows how to present the form to you and let you input data into the form without talking to the mainframe at all. This was one reason mainframes were so much more powerful than Unix: the CPU didn’t have to handle your line editing; it was offloaded to a smart terminal. (If you couldn’t afford smart terminals for everyone, you bought a System/1 minicomputer to sit between the dumb terminals and the mainframe and handle the form editing for you).

Anyhoo, after you filled out your form, you pressed SEND, and all your answers were sent back to the server to process. Then it sent you another form. And on and on.

Awful. How do you make a word processor in that kind of environment? (You really can’t. There never was a decent word processor for mainframes).

That was the first stage. It corresponds precisely to the HTML phase of the Internet. HTML is CICS with fonts.

In the second stage, everybody bought PCs for their desks, and suddenly, programmers could poke text anywhere on the screen wily-nily, anywhere they wanted, any time they wanted, and you could actually read every keystroke from the users as they typed, so you could make a nice fast application that didn’t have to wait for you to hit SEND before the CPU could get involved. So, for example, you could make a word processor that automatically wrapped, moving a word down to the next line when the current line filled up. Right away. Oh my god. You can do that?

The trouble with the second stage was that there were no clear UI standards… the programmers almost had too much flexibility, so everybody did things in different ways, which made it hard, if you knew how to use program X, to also use program Y. WordPerfect and Lotus 1-2-3 had completely different menu systems, keyboard interfaces, and command structures. And copying data between them was out of the question.

And that’s exactly where we are with Ajax development today. Sure, yeah, the usability is much better than the first generation DOS apps, because we’ve learned some things since then. But Ajax apps can be inconsistent, and have a lot of trouble working together — you can’t really cut and paste objects from one Ajax app to another, for example, so I’m not sure how you get a picture from Gmail to Flickr. Come on guys, Cut and Paste was invented 25 years ago.

The third phase with PCs was Macintosh and Windows. A standard, consistent user interface with features like multiple windows and the Clipboard designed so that applications could work together. The increased usability and power we got out of the new GUIs made personal computing explode.

So if history repeats itself, we can expect some standardization of Ajax user interfaces to happen in the same way we got Microsoft Windows. Somebody is going to write a compelling SDK that you can use to make powerful Ajax applications with common user interface elements that work together. And whichever SDK wins the most developer mindshare will have the same kind of competitive stronghold as Microsoft had with their Windows API.

If you’re a web app developer, and you don’t want to support the SDK everybody else is supporting, you’ll increasingly find that people won’t use your web app, because it doesn’t, you know, cut and paste and support address book synchronization and whatever weird new interop features we’ll want in 2010.

Imagine, for example, that you’re Google with GMail, and you’re feeling rather smug. But then somebody you’ve never heard of, some bratty Y Combinator startup, maybe, is gaining ridiculous traction selling NewSDK, which combines a great portable programming language that compiles to JavaScript, and even better, a huge Ajaxy library that includes all kinds of clever interop features. Not just cut ‘n’ paste: cool mashup features like synchronization and single-point identity management (so you don’t have to tell Facebook and Twitter what you’re doing, you can just enter it in one place). And you laugh at them, for their NewSDK is a honking 232 megabytes … 232 megabytes! … of JavaScript, and it takes 76 seconds to load a page. And your app, GMail, doesn’t lose any customers.

But then, while you’re sitting on your googlechair in the googleplex sipping googleccinos and feeling smuggy smug smug smug, new versions of the browsers come out that support cached, compiled JavaScript. And suddenly NewSDK is really fast. And Paul Graham gives them another 6000 boxes of instant noodles to eat, so they stay in business another three years perfecting things.

And your programmers are like, jeez louise, GMail is huge, we can’t port GMail to this stupid NewSDK. We’d have to change every line of code. Heck it’d be a complete rewrite; the whole programming model is upside down and recursive and the portable programming language has more parentheses than even Google can buy. The last line of almost every function consists of a string of 3,296 right parentheses. You have to buy a special editor to count them.

And the NewSDK people ship a pretty decent word processor and a pretty decent email app and a killer Facebook/Twitter event publisher that synchronizes with everything, so people start using it.

And while you’re not paying attention, everybody starts writing NewSDK apps, and they’re really good, and suddenly businesses ONLY want NewSDK apps, and all those old-school Plain Ajax apps look pathetic and won’t cut and paste and mash and sync and play drums nicely with one another. And Gmail becomes a legacy. The WordPerfect of Email. And you’ll tell your children how excited you were to get 2GB to store email, and they’ll laugh at you. Their nail polish has more than 2GB.

Crazy story? Substitute “Google Gmail” with “Lotus 1-2-3”. The NewSDK will be the second coming of Microsoft Windows; this is exactly how Lotus lost control of the spreadsheet market. And it’s going to happen again on the web because all the same dynamics and forces are in place. The only thing we don’t know yet are the particulars, but it’ll happen.

There’s no place like 127.0.0.1

Back home in New York. We had about 75-100 people come to the New York demo yesterday, along with an army of Fog Creek technical staff in matching sky blue kiwi polo shirts.

When I got back to my desk on Monday afternoon, I turned into the prototypical bastard client from hell. Our web designers probably hate me. I did the one thing that drives web design firms completely crazy: I suddenly took a look at the new web design they’ve done for us, which I’ve been approving every step of the way, and didn’t like it any more, so I told them we had to start over.

In one of Gerald Weinberg’s books, probably The Secrets of Consulting, there’s the apocryphal story of the giant multinational hamburger chain where some bright MBA figured out that eliminating just three sesame seeds from a sesame-seed bun would be completely unnoticeable by anyone yet would save the company $126,000 per year. So they do it, and time passes, and another bushy-tailed MBA comes along, and does another study, and concludes that removing another five sesame seeds wouldn’t hurt either, and would save even more money, and so on and so forth, every year or two, the new management trainee looking for ways to save money proposes removing a sesame seed or two, until eventually, they’re shipping hamburger buns with exactly three sesame seeds artfully arranged in a triangle, and nobody buys their hamburgers any more.

This is sort of what happened with our new web design. We’ve been tweaking it and polishing it and changing things carefully, and the firm we hired to design it has been taking us step-by-step through information architecture, site maps, wireframes, initial designs, and several rounds of design. All with a carefully-designed process to get our buy-in at every step along the way. And so far every step I thought the design was converging and we’d get a nice web design out of it.

And then I came back after a week on the road, took one look at it, and thought, oh crap. We can’t go public with that.

And they said, “but wait, look here, it’s right in Basecamp, you said that this design was ‘excellent work’ and you were ‘elated’ to have the ‘best web design ever in the history of the universe.'”

True that. I did say that. I even thought that.

But a week later, the same basic design just looked terrible. We’ve been removing sesame seeds from the initial design they did in hopes of making things better, and, lo and behold, at some point the design flipped from being good to being bad. Links had sprouted up all over the place, making it hard to tell where to go next and where you’ve already been. Most of the elegant whitespace in the original design was lost when we went from the original 1024 pixel wide design to an 800 pixel design. The web designers had presumably been working on Macs and showing us bitmaps, but since the antialiasing technology is different, when we finally got the HTML, the page just felt completely different and had crossed into the realm of plain and, subjectively, ugly.

Ah, well. We’ll start over. It’s better to have something we’re both proud off than to try and salvage the work done so far. Sometimes you have to go all the way through the design process before you realize that you’ve built the wrong thing, but it’s ok, it’s a learning experience, it’s not the end of the world to take a deep breath and go back to step 1.

Chicago

Chicago; about 70 local software developers turned out. Chicago is a great city for architecture. Much better than New York. Here they build skyscrapers just because they love them.

Can I talk about hotels for a minute? There are about a million different ways to rate hotels. That makes user review sites, like TripAdvisor, somewhat hit-or-miss. One person’s hovel may be another person’s palace.

I’m sad to say that the Congress Plaza Hotel where we did the event at this morning does not qualify as anyone’s palace. The usual nice words you might use to describe such a hotel would be “threadbare” or “shabby.” Other words (“maccabre,” “Barton Fink,” and “scuzzy”) come to mind. This was entirely my fault; I set a target budget for hotels in each city and didn’t do the research to make sure the hotels would be entirely nice.

I’ll bet you can tell almost everything you need to know about the quality of a hotel based on how often they replace the sheets and towels. Another good indicator, for some bizarre reason, is plasma TVs. The nice chains, inexplicably, have old fashioned big-ass tube TVs. The ancient rotting edifices have 32” plasmas. I don’t know why this is. Maybe they think that having a plasma TV they can advertise on their web site will make them seem fancy.

Anyway, the Congress Plaza Hotel is the kind of 850 room monstrosity that Lot Polish Airlines would fill up with a 777 full of passengers, on their way to Warsaw, if the plane was stuck in Chicago overnight due to mechanical failure.

Oh. And the staff was actually on strike. So people coming to the demo had to cross a picket line. I’m sorry about that. I never thought to ask if there was a strike at the time we booked. I guess shabby hotels just treat their employees shabbily. Apparently Chicago considered passing a law requiring hotels to tell people about these strikes when they booked rooms and meetings. It didn’t pass.

The lesson from Chicago is that using cheap hotels is not a good idea for business meetings. Psychologically, I think that people tend to associate the environment they’re in with the presentation. When a demo is in a modern, new, shiny business hotel, it’s like a little one hour vacation in luxuryland. You go to the bathroom and it’s marble everywhere and individual cloth hand towels. And you think nice things about the demo. But when you go to a demo at the Congress Plaza and the rug is stained and there are fluorescent lights everywhere and the bathroom looks like LaGuardia airport, some of that general depressing aura of shabbiness will rub off on the product being presented.

After the demo was over I walked two blocks south to the Hilton where the Inc. 500 Conference was in progress. There I spoke to a bunch of small companies about how we hire people at Fog Creek. A lot of the material I talked about is available on this site, as a series of articles I wrote about a year ago:

You can get the whole series plus one bonus chapter in book form, as well.

I’m flying back to New York now, on We’ve Pretty Much Just Given Up Airlines. On Monday morning, if you’re in the city, please join me and the FogBugz development team at 9:00am for the FogBugz 6.0 World Launch, at the New York Marriott East Side Hotel. It’s free, but you have to register to reserve a space.

Seattle

About 200 attendees came in Seattle; this will be one of our biggest scheduled demos except, maybe, London. I apologize that the screen wasn’t so easy to read throughout the room. We made a point of asking the hotel before we booked the room if the screen would be clearly visible to all attendees. It wasn’t, because the ceiling wasn’t high enough, and the hotel didn’t have the ability to raise the ceilings, so of course they lied and told us it would be fine, and it wasn’t. This is a problem at every tech conference I’ve ever been to; these rooms were built long before PowerPoint.

A bunch of Microsoftees had to cancel because they are having their annual company meeting today in a gigantic football stadium. The size of that company is insane. Can you imagine Safeco Field filled to the brim with software developers? And that’s just the Vista Shutdown Menu Team.

Something that came up in the demo today that I wanted to share. Software development is a cycle with three distinct phases: design, development, and debugging. It doesn’t really matter whether you’re doing Ultra Extreme Elite Programming, in which case you do all three phases in one week, or The Ancient OS/360 Waterfall Method, in which case you do them over the course of a year or two. You still have to design what you’re going to build, build it, and then debug it.

The three phases have to be scheduled in a very different way.

  • Design is the art phase, where you’re doing new, creative work. Even though what you’re doing is completely new, after you’ve gone through a few software development cycles you’ll start to get a pretty good idea of how much time it takes to design a new version of your software. I’ve usually worked with relatively long development cycles of 12-18 months, and it’s always taken me about two months to get a detailed, first-draft spec containing enough detail for the development team to create very granular estimates.

    That said, when you’re building something brand new from scratch, you really can’t estimate the design phase at all, and that’s OK. Today I met somebody from a company in Seattle that’s working on a project headed up by one of the world’s great programmers, Charles Simonyi. Near as I can tell, they have been in the design phase for 16 years.

  • Development is the engineering phase. It’s a construction project. As long as you start with a detailed blueprint, which, of course, can change over time, but which is really your best guess for what you’re building, this phase can be scheduled with great precision. FogBugz 6.0 has a spiffy new feature called Evidence-based scheduling, which uses a variation on the Monte Carlo method for making your schedules remarkably reliable during the development phase. When I get a chance, I’ll you about it in more detail.

  • Debugging is the science phase. Science is difficult to schedule because you’re looking for things, and predicting when you’re going to find them is remarkably difficult. Unless you know in advance how many bugs you’re going to find, you don’t have an ice cube’s chance in the Sahara to work out a detailed estimate of how long this phase will take. Here at Fog Creek we’ve learned that for a new release of FogBugz, this phase takes at least 12 weeks, sometimes a little more, and we just leave it at that.

    Some people are more ambitious and try to track the rate at which bugs are being found and being fixed, and try to extrapolate to see when you’ll ship; in practice I’ve found that the rate of finding bugs is way too messy to be able to extrapolate from. If you have a fixed size test team and infinite bugs, they’ll find bugs at a constant rate, simply because they’re all spending 8 hours a day entering bugs and then stopping, but this flat line doesn’t mean you’re ever going to ship. You’re not. You have infinite bugs, remember? Sorry. On the other hand you may have just released a new beta to a new batch of beta testers, and it’s the best beta yet, but it’s the biggest group of beta testers, so there will be a big spurt of new bugs found, which doesn’t mean the code is getting buggier—it just means you have a bigger group of beta testers.

Now, there are various ways to get in trouble. If you don’t like writing functional specifications or doing up-front design, what happens is that you’re burdening the design phase with the development phase. If you ever started a new project by writing code, and you thought you’d “design as you went along,” what you’re doing is driving around with the handbrakes on. Here’s why. Designing a feature by writing a thoughtful spec takes about 1/10th as much time as writing the code for that feature—or less. If you try to code as you design, then you’re interrupting your short spurts of design with long spurts of coding. Now, if you’re the kind of person who designs everything perfectly the first time, that’s fine. But I don’t think you are. I think that your first designs are pretty good, but when you see them, you get ideas for even better designs. And if you already coded up the first draft, bad design, well, that’s coding time wasted. Your product’s design can only get better at 1/10th the speed that my product’s design can.

I’m now en-route to Chicago, where I have to do two speeches tomorrow: the morning FogBugz demo, and, in the afternoon, I’ll be talking about hiring to a bunch of startup CEOs at the Inc. 500 conference. I’m editing this in the e text editor, a Windows clone of TextMate, which is coming along nicely but could still use some polish before I’m ready to switch to it full time.

Vancouver, BC

Vancouver, BC: Day one of the FogBugz World Tour. About 120 people showed up to see the first public demo of FogBugz 6.0, which will officially launch next Monday.

Vancouver is, without a doubt, one of North America’s most beautiful cities. Sparkling, clean, everything works well, nothing can possibly go wrong, people are friendly, and with the new weakened US dollar it’s really quite a prosperous place to live. Brett and I had dinner at Joe Fortes, where you get a choice of 4 different local species of salmon, maybe 20 other kinds of fresh fish, or about 10 different type of oysters, and there’s a beautiful rooftop deck where you can enjoy the usually pleasant Vancouver weather.

The demo went relatively smoothly, despite a few first-time kinks. At some point I was fiddling around so much with the report generator that I queued up a backlog of lengthy Monte Carlo simulations on the web server which made FogBugz lose interest in continuing with the demo; this not the kind of thing that happens in production web servers (part of the problem is that the laptop is running XP which has a kind of 3/4-baked implementation of IIS, version 5.1, which is not what anyone would run on a real server). Anyway, I had to restart IIS in the middle of the demo. Ooops. Hopefully that won’t happen again.

It couldn’t be that bad. Here’s some email feedback we already received from the demo:

Thank you for the entertaining and informative talk. We will be buying FogBugz as a result.” Thank you! OK, we just broke even in Vancouver.

“Do you intend to provide free versions of FogBugz for open source projects, non-profits, or small teams of 2 people (like Perforce does with their products)?”. Yes, it’s called the Student and Startup Edition. We’ll announce it soon, but it’s available now.

“Well, I expected to be bored about FogBugz and enraptured by fascinating tidbits of Joel Spolsky wisdom. In reality, the opposite was true. I now believe FogBugz to be a pretty interesting looking app, whereas an hour before your spiel I think I described it to someone as a “glorified Excel spreadsheet” (it’s amazing what I can come up with when I am not encumbered by facts).” My tidbits aren’t that fascinating.

Having just attended Joel’s FogBugz 6.0 demonstration in Vancouver, we were very impressed with its capabilities.  My boss wants to go ahead and use FogBugz, however Joel mentioned that the Linux/Unix version was still in Beta.” For Unix we’re still on 5.0 while we debug the PHP port, which I hope won’t take long. While you’re waiting for Unix FogBugz 6.0, you can either run 5.0 — and upgrade for free — or run a free trial on our server, and download the data when we’re shipping.

The night before, in rehearsals, I discovered that the Fn+F7 trick that is supposed to turn on external monitors on this Thinkpad was actually freezing the computer solid, due to some kind of buggy interaction between the Intel 965 graphics chip software and the IBM/Lenovo Presentation Director software. I never did solve that problem, so I learned to use the Intel software to turn on the external monitor instead of pressing Fn+F7.

Flying in September after Labor Day is really not that bad, despite the scare stories you might have heard in the press; once everyone gets home from summer vacations the number of passengers in airports and on flights drops quite dramatically and flights start operating closer to schedule with much shorter lines. So far I don’t think we’ve waited in one line at an airport. Here are my favorite tricks for planning air travel to avoid chaos, delays, and cancellations:

  1. The ideal time to fly is around 10 am. Usually delays pile up throughout the day, so the earlier you fly, the less likely you are to suffer delays. The very early flights are popular with people who want to get a full day in, so the midmorning flights tend to be the most civilized.
  2. Always check the OAG before booking to see what flights are available. The OAG includes JetBlue and Southwest flights which the online travel agencies can’t show you.
  3. Make sure you’re never on the last flight of the day if you really need to get somewhere on schedule. If something happens to the last flight, you’re in trouble. As a general principle, while planning for this trip, I always checked that there was at least one alternative flight that would get me to my next destination on time. Since we fly first class at Fog Creek, if one of our flights got cancelled, the airline will work hard to reaccomodate us while the coach passengers might have to wait forever for a rebooking.
  4. Fly out of smaller airports whenever possible. My favorite alternative airports: John Wayne or Burbank instead of LAX, Ft. Lauderdale instead of Miami, Love instead of DFW.
  5. If the flight you’re booked on is cancelled, don’t wait in line with the crowds for the single, overworked airline representative. Get on the phone to your airline’s frequent flyer priority number. They can rebook you just as well.
  6. The American Express Platinum card pays for itself just from the free membership in Continental, Northwest, and Delta’s lounges… not only because the lounges are quiet and pleasant, but because the lounges have unharried and experienced airline agents who are happy to help you with complicated problems, rebookings, and upgrades.
  7. Final trick: never schedule an important flight during the last few days of the month, especially on Northwest. Pilots are only allowed to fly a certain number of hours per calendar month and by the end of the month they’re running out of hours, especially on the more awfully-managed airlines like Northwest, so flights galore get cancelled in the last few days of every calendar month.

See you tomorrow in Seattle! There are still five seats available if you haven’t registered yet.