Phew! and w00t! Last night at about 7:35 FogBugz 4.0 finally went live, on the exact day we planned to ship it quite a few months ago.

 FogBugz 4.0, the CD-ROM

I have put a lot of other things on hold while we got this major upgrade out the door, so I’ll be spending some time in catch-up mode for the next few weeks. And now I’m going to take a nap.

The Large Print Giveth and the Small Print Taketh Away: What we shipped today was FogBugz 4.0 for Windows. The Unix & Mac versions are now in beta and will be shipping Real Soon Now.


Usability Time! When Microsoft AntiSpyware is running it displays this dialog:

Dialog box from Microsoft AntiSpyware, containing the text 'Detected Spyware on your system:'

… which looks, to me, like it’s telling me that it detected spyware on my system.

Oh, wait! No, that’s not it, it’s just a lazy programmer who wrote this code:

20 FOR I = 1 TO 1000000000

I think I get it. It’s the heading for a list which has not arrived yet because you’re still busy scanning my harddrive searching for spyware which I don’t have. The usual programmer mentality (“it’s just a list with 0 elements, what’s so hard to understand about that?”). Hey guys, next time don’t use a message that’s only one pixel away from telling me the exact wrong fact about whether or not there’s spyware on my system.

So far, it looks like this is a nifty program, and consumers should be happy that Microsoft has announced it will be free, but it really, really would have been nice for us here in the software industry if Microsoft had set a price on this thing just to provide some air cover for the other companies working on spyware removal. This is not a software category where a monopoly monoculture will be a good thing.

Not only that, but I wonder if Microsoft can run an antispyware product without huge conflicts of interest. For example, will they block all the spyware that Real installs on your system? While Real is suing them? Especially when blocking spyware from Real will just give Real more ammunition to use against Microsoft in court? And the next time Microsoft needs a DRM favor from your friendly neighborhood media conglomerate, will the media conglomerate demand exemption from Antispyware removal for their adware in exchange for supporting Windows Media 37.0, with the new brain-zapping feature that prevents you from humming any song unless you bought the performance rights? (A sheet of tinfoil wrapped tightly around your skull is effective against this zapper, I understand.)

I understand that Microsoft wants to help customers who feel like a spyware-free operating system should be your right when you pay for WinXP, but it’s a shame that by giving it away free they’re likely to wipe out a useful industry and replace it with something that’s difficult to trust due to conflicts of interest.


Katie Lucas: “Step 1: write about running really fast. Step 2: Go and draw a plan of the racetrack. Step 3: go and buy really tight lycra shorts. Step 4: run really, really, really fast. Step 5: cross line first.”

Foreword to Painless Project Management with FogBugz, by Mike Gunderloy

Painless Project Management with FogBugzThere’s a restaurant in my New York City neighborhood called Isabella’s that’s always packed.

Downstairs, upstairs, at the sidewalk cafe, it’s mobbed. And there are large crowds of happy yuppies out front, waiting 45 minutes for a table when they can clearly see other perfectly good restaurants right across the street which have plenty of tables.

It doesn’t matter when you go there. For Sunday brunch, it’s packed. Friday night? Packed, of course. But go on a quiet Wednesday night at 11:00 PM. You’ll get a table fairly quickly, but the restaurant is still, basically, packed.

Is it the food? Nah. Ruth Reichl, restaurant reviewer extraordinaire from the New York Times, dismissed it thusly: “The food is not very good.”

The prices? I doubt anyone cares. This is the neighborhood where Jerry Seinfeld bought Isaac Stern’s apartment with views over two parks.

Lack of competition? What, are you serious? This is Manhattan!

Here’s a clue as to why Isabella’s works. In ten years living in this neighborhood, I still go back there. All the time. Because they’ve never given me a single reason not to.

That actually says a lot.

I never go to a certain fake-Italian art-themed restaurant, because once I ate there and the waiter, who had gone beyond rude well into the realm of actual cruelty, mocking our entree choices, literally chased us down the street complaining about the small tip we left him.

I stopped going to another hole-in-the-wall pizza-pasta-bistro because the owner would come sit down at our table while we ate and ask for computer help.

I really, really loved the food at a local curry restaurant with headache-inducing red banquettes and zebra-striped decor. The katori chat was to die for. I was even willing to overlook the noxious smell of ammonia wafting up from the subterranean bathrooms. But the food inevitably took an hour to arrive, even when the place was empty, so I just never went back.

But in ten years I can’t think of a single bad thing that ever happened to me at Isabella’s.


So that’s why it’s so packed. People keep coming back, again and again, because when you dine at Isabella’s, nothing will ever go wrong.

Isabella’s is thoroughly and completely debugged.

It takes you ten years to notice this, because most of the time when you eat at a restaurant, nothing goes wrong. It took a couple of years of going to the curry place before we realized they were always going to make us miss our movie, no matter how early we arrived, and we finally had to write them off.

And so, on the Upper West Side of Manhattan, if you’re a restaurant, and you want to thrive, you have to carefully debug everything.

You have to make sure that there’s always someone waiting to greet guests. This person must learn never to leave the maitre d’ desk to show someone to their table, because otherwise the next person will come in and there will be nobody there to greet them. Instead, someone else needs to show patrons to their tables. And give them their menus, right away. And take their coats and drink orders.

You have to figure out who your best customers are—the locals who come on weekday nights when the restaurant is relatively quiet—and give them tables quickly on Friday night, even if the out-of-towners have to wait a little longer.

You need a procedure so that every water glass is always full.

Every time somebody is unhappy, that’s a bug. Write it down. Figure out what you’re going to do about it. Add it to the training manual. Never make the same mistake twice.

Eventually, Isabella’s became a fabulously profitable and successful restaurant, not because of its food, but because it was debugged. Just getting what we programmers call “the edge cases” right was sufficient to keep people coming back, and telling their friends, and that’s enough to overcome a review where the New York Times calls your food “not very good.”

Great products are great because they’re deeply debugged. Restaurants, software, it’s all the same.

Great software doesn’t crash when you do weird, rare things, because everybody does something weird.

Microsoft developer Larry Osterman, working on DOS 4, once thought he had found a rare bug. “But if that were the case,” he told DOS architect Gordon Letwin, “it’d take a one in a million chance for it to happen.”

Letwin’s reply? “In our business, one in a million is next Tuesday.”

Great software helps you out when you misunderstand it. If you try to drag a file to a button in the taskbar, Windows pops up a message that says, essentially, “You can’t do that!” but then it goes on to tell you how you can accomplish what you’re obviously trying to do (try it!)

Great software pops up messages that show that the designers have thought about the problem you’re working on, probably more than you have. In FogBugz, for example, if you try to reply to an email message but someone else tries to reply to that same email at the same time, you get a warning and your response is not sent until you can check out what’s going on.

Great software works the way everybody expects it to. I’m probably one of the few people left who still closes windows by double clicking in the top left corner instead of clicking on the [x] button. I don’t know why I do that, but it always works, with great software. Some software that I have is not so great. It doesn’t close if you double click in the top left corner. That makes me a little bit frustrated. It probably made a lot of people frustrated, and a lot of those people probably complained, but I’ll bet you that the software developers just didn’t do bug tracking, because they have never fixed that bug and probably never will.

What great software has in common is being deeply debugged and the only way to get software that’s deeply debugged is to keep track of your bugs.

A bug tracking database is not just a memory aid, or a scheduling tool. It doesn’t make it easier to produce great software, it makes it possible to create great software.

With bug tracking, every idea gets into the system. Every flaw gets into the system. Every tester’s possible misinterpretation of the user interface gets into the system. Every possible improvement that anybody thinks about gets into the system.

Bug tracking software captures the cosmic rays that cause the genetic mutations that make your software evolve into something superior.

And as you constantly evaluate, reprioritize, triage, punt, and assign these flaws, the software evolves. It gets better and better. It learns to deal with more and more weird situations, more and more misunderstanding users, and more and more scenarios.

That’s when something magical happens, and your software becomes better than just the sum of its features. Suddenly it becomes reliable. Reliable, meaning, it never screws up. It never makes its users angry. It never makes its customers wish they had purchased something else.

And that magic is the key to success. In restaurants as in software.


Book cover image: Painless Project Management with FogBugzw00t! Hot off the presses, copies of Mike Gunderloy‘s excellent new book, Painless Project Management with FogBugz, just arrived in the Fog Creek Office!

This book covers FogBugz 4.0 in detail. I wrote the foreword, which you can read here.

Astute readers will note that FogBugz 4.0 is not quite available for sale yet, but it will be, by March 1st. We’re also going to be selling Mike Gunderloy’s book on the Fog Creek online store, which will be the first inventory item in the history of Fog Creek, so we’ll need to make some changes to the online store software.


Thanks to everyone who sent me advice on our new servers. I’ve updated the document a bit with the information I’ve learned.

Upcoming Events

A lot of events coming up in March!

On March 16th I’ll be speaking at the O’Reilly Emerging Technology Conference in San Diego. It will be somewhat similar to my talk at, which was about why Brad Pitt and the Apple iPod are so doggone popular, while Ian Somerhalder and the elegantly named “Creative Nomad Jukebox Zen Extra” are not. Also there will be jokes and music.

The very next day, March 17th, I’ll be at Software Development West in Santa Clara, at a “fireside chat” with Alexandra Weber Morales. I’m not sure where the fire is coming from.

And the very next day after that, March 18th, my publisher, Apress, will host an open-house/pizza party in Berkeley (exact time and location to be announced) for anyone who wants to stop by and say hi.

Way in the future, like, July or something, I’ll be giving a keynote at the Cold Fusion conference in Washington, D.C. More about that as we get closer.

Colo Expansion Version 2.0

(Updated 2/6/05)

Dell PowerEdge 2650It’s been about two years since we first installed a server at the Peer 1 colocation facility downtown. That server, a Dell 2650, has patiently handled the load for Joel on Software and Fog Creek gracefully, but there are a few issues that are starting to become more significant as Fog Creek grows and reliability becomes more important. And we need more server horsepower and scalability.

In the past I would have upgraded the system, installed it all, and written an article about it. And then people would have emailed me to suggest better ways to do things, but it would be too late.

So now for the first time ever, I’m going to publish a “live” article here. So far, I haven’t done anything. If you have any better suggestions for how to do things, it’s not too late for me to learn from your experience. As we go along, I might have a few questions for my readers who have experience with this stuff. Thanks to everyone who has emailed me so far!

Step one was figuring out what problems we’re trying to solve. Here’s the list:

  • Not enough horsepower to handle the large number of FogBugz online trials we’ll be hosting. FogBugz 4.0 is a couple of weeks away from shipping and that should create a flood of people wanting to take advantage of the free online trial. (By the way, if you want to beat the traffic, apply to join the FogBugz 4,0 beta.)
  • Running an email server on Windows allowed us to save hardware, at the cost of buying a commercial email product, IPswitch IMail. We are very close to using up the maximum number of mailboxes we’re entitled to create, and we’d really like to switch to a Linux mail server running qmail or Postfix. The votes seem to be in favor of Postfix.
  • Although we have a standby server, a small Dell 750 we stuck in there last summer, we don’t really have a good story for handling failures. Right now we could recover from most kinds of failures in about 24 hours, and we have to take all the sites down for about 15-30 minutes when we need to apply OS patches or upgrade the server. But Fog Creek’s daily revenue has grown to the point where I’m not happy with that. Our new goal is to have no single point of hardware failure be able to bring down any of our sites, to be able to upgrade OSes without bringing everything down, and to be able to recover from very major problems in 15 minutes. This goal is fairly absurd. There will always be single points of failure. The ozone layer, for example. A more accurate goal would be “No single hard drive, power supply, or fan failure takes us out of commission for even 1 second. Any other server failure can be recovered from in 15 minutes. Routine Windows Updates will no longer bring our site down. If something happens to the colo facility, its power supply, or the City of New York, well, we go off the air for a while.”
  • And we want to have an architecture in place to be able to scale up our hardware relatively easily to meet new demand.

Step one in this project was finding a good sysadmin to help design and implement this project. One of the great things about Fog Creek in its fifth year is that when we need to do something, we have the money to do it. So instead of me trying to build mission critical server networks using off-the-shelf consumer grade plastic linksys crap, I called up Adam Prato, who I had worked with at Juno, who has years of experience building out these kinds of things. After about an hour briefing he wrote up a very professional plan, sketched out the basic architecture of the new network, and started recommending purchases.

In choosing servers, there’s a lot of benefit to using totally standard, identical parts. It makes it easy to swap things around when you need to scale or when something fails, and things are cheaper by the dozen. In order to avoid going totally broke, we still want reasonably affordable servers, but my requirement of “no single point of failure” means we need servers with dual power supplies, hot-swap RAID, and everything engineered for longetivity. That tends to narrow things down to the Dell PowerEdge 2850, which is the updated version of the 2650 we have now. I looked into the equivalent HP box. It looks like it costs 2-3 times as much for the equivalent thing.

The current plan is to get six servers:

1 and 2: web servers running Windows 2003. Two mirrors, each containing every site we host. We’re using Win2003 instead of Win2000 because mirroring is much easier – the IIS 6.0 metabase is just a big XML file so we should be able to mirror the two machines using something as simple as ROBOCOPY. In fact, I think we can stage all our web sites back at the Fog Creek office and let Robocopy distribute them out to the web servers.

We will have some rudimentary load balancing by using a different IP address for every site that we host, and we can program each server’s NIC to respond to some subset of IP addresses. That way moving a website from one server to the other is a simple matter of changing what IP addresses each server uses. We can scale up by adding more of these servers as we go along, and in the future we can use round-robin DNS or a real load balancer to distribute the load if necessary. If one server dies, the other one is a mirror so it can take up the whole load. Many people wrote in to recommend Windows’ built-in Network Load Balancing. I think this may be overkill for the first generation but it sounds like a good idea in the long run.

3: SQL Server. Because everybody is sharing the same SQL Server, this machine will be slightly different than its peers: it will have the fastest available CPU available and it will have more hard drives plugged in. I heard that it’s a good idea to get the transaction logs onto a different physical drive than the data itself, for better performance, so I think this machine will have two RAID 1 mirror sets (4 hard drives). Most correspondents agreed that this is the best architecture.

4: Mail and everything else server. Linux.

5: Routing server. Probably NetBSD. This machine will have four ethernet ports and will route between the public internet, the direct connection to the Fog Creek office, the “DMZ” containing our public-facing servers, and a private network segment containing the SQL Server machine. Many people suggested OpenBSD for better firewall support (“pf”). I’m out of my league here, and will have to trust Adam to use the right technology. Also having different OSes for #4 and #5 doesn’t make sense so we’ll try to pick one OS for both.

Other people thought that it might be better to use a solid-state hardware firewall/router rather than build our own on a generic Intel server. The advantages of building our own is a lot more flexibility, and somehow I think that the Dual-everything Dell PowerEdge 2850 is going to be a lot more robust than one of those dinky firewall/routers you get at CompUSA. I’ve already had one of those die on me with no explanation…

6: Cold backup. Another machine, identical to the first five, sitting around ready to be used for whatever we need it for or to serve as an emergency replacement.

Each machine will contain:

  • A Xeon processor, 800 MHz Front Side Bus, with 1 MB Cache
  • SCSI RAID with room for 6 drives. All the systems will start with 2x146GB drives installed in RAID 1 (mirroring) formation, except for the SQL machine which will have 4 drives. Every hard drive in the rack will be a Seagate Cheetah, 146GB SCSI U320, at 10000 rpm. Since they’re hot swappable we can move them around at will and we can keep a couple of spares in the cage. 10K is much cheaper than 15K. Some people suggested using 15K for the SQL Server but I’ve actually heard elsewhere that RPMs aren’t such a big deal for SQL Server. And I really don’t want to have two kinds of hard drive.
  • Dual Gigabit NICS. The whole network is going to be gigabit ethernet and we’ll use unmanaged Dell gigabit switches which seem to be priced well below the (NetGear) competition. The managed gigabit switches are more than 3x the price of unmanaged gigabit switches.
  • 2 GB RAM
  • Dell’s DRAC 4/I remote management card (more about that in a minute).

I called up Dell and got a quote for this whole mess. By the way, if you’re ever buying more than one or two machines, it’s worth actually calling Dell. Somehow you always get a lower price than you would have gotten on the website. I’ve also been told that if you call during the last week of Dell’s fiscal quarter, which I just missed, you get great deals as they spaz out trying to make their numbers for Wall Street in a classic display of American public company mismanagement.

I still have a list of open questions that I need to work on this week. If you have any answers, email is appreciated!

Question 1 – Power Management

Dell’s built-in management card, the DRAC 4/I, has the ability to turn on and off the machine its plugged into, to resolve “stuck” systems, which is nice. But in two years of operating our current server, we’ve learned that the DRAC card itself (an older model, the DRAC 3) is more likely to freeze up than Windows 2000 itself. And there’s no cure to a frozen DRAC card other than a full power cycle.

To address this, out of desperation, we put an APC MasterSwitch in the rack. This is basically a power strip with a web server. You can connect to it using any web browser to turn on or off anything plugged into the back. The MasterSwitch is awesome. It does one thing, and does it well.

The trouble is, this poor MasterSwitch only has 8 outlets and we have 6 servers with two power supplies each. It looks like there’s a company called Sentry that makes something called the Dual Feed Power Tower, which, if I understand correctly, is designed exactly for these servers with dual feeds. Has anyone ever used such a thing? I can’t find a price on the website which always makes me worry. It looks like it’s about $1350. For less money I can get a new APC MasterSwitch Plus plus the “expansion pack” giving me 16 power outlets on two separate boxes thus reducing points of failure.

And another question — according to Dell’s very cool Rack Advisor, my six PowerEdge’s are going to need about 35 amps of power. Yelps. Is that believeable? From what I remember you’re not even allowed to have more than 20 amps on each circuit. So I’m wondering if a single power tower switcheroo thing can handle the load. And speaking of power, sometime this week I need to track down Joe Cooper of Peer 1 networks and find out how much it would cost to upgrade from a 1/8th rack to a 1/2 rack. We’ll be going from 3U today to 14U. I’m hoping that Peer 1 has some way of getting me enough amps to run six servers. Or maybe when Dell says that one of their servers consumes over 5 amps, they mean “fully loaded”, and we won’t really be close to fully loaded. People have told me that Dell is on drugs. Technically the power supplies in these things are 700 watts, but realistically you never draw that much, and so 20 amps should be fine.

Question 2 – Rack Management

As soon as we got the second server, our small backup PowerEdge 750, and put it on top of the old 2650, we had to add RAM to the 2650 (on the bottom), which was a terrible pain in the buttocks. Which made me realize we really need to rack mount all our servers properly. I need to get Peer 1 to tell me what kind of mounting equipment I need to order… Dell gives you a choice of “VersaRails” or “RapidRails.” I’m sure we’ll figure out how to install things, but it would be nice if Dell had a video showing how to rack mount their servers instead of badly written technical documentation with instructions like “Install two 10-32 x 0.5-inch flange-head Phillips screws in the back mounting flange’s top and third-from-top holes to secure the slide assembly to the back vertical rail.” What? VersaRail is for round holes and requires screwing things in. Rapid Rails is for square holes and things just click into place. It’s not that hard, I’m told 🙂

Question 3 – Remote Control

We need to decide between a remote KVM-over-IP system and just using Dell’s built-in DRAC 4/I remote control.

The DRAC has two advantages I know of:

  1. It has other nice features like power cycling (when it doesn’t freeze) and the ability to boot your server off of a floppy or CD-ROM which is mounted over the network.
  2. It’s a lot cheaper; KVM over IP is pretty expensive.

But I’m worried about the DRAC’s KVM capabilities. With version 3, you could see the computer screen while it was booting up, but after that the remote control capability disappeared for a while until the operating system came up, after which it seemed to use Windows Terminal Services or something like that to remote control the server. This is a little bit to OS-dependent for comfort. I have heard rumors that the new model DRAC 4/I fixes this problem and lets you remote control the server no matter what state the OS is in. Can anyone confirm this? If so we can probably live without an expensive KVM-over-IP solution. The bottom line is that we need to have the DRAC 4/I, because there’s no other way to turn on the server remotely. Power cycling a server that was on will make it come back on, but if you screw up and tell windows to shutdown you can power cycle until next monday and it won’t go on. Given that we have to buy the DRAC 4/Is anyway, I’m hoping that they are good enough and we can live without expensive KVM over IP. If I’m wrong we’ll just add KVM later.

Question 4 – SQL Server on its own Machine

I’ve never tried running SQL Server on a separate physical box than the ASP application that’s using it. I know it works since FogBugz customers do it all the time, and it seems like it should improve performance, but then again, I’m worried about the network bottleneck. We’ll have fast gigabit ethernet and the two machines are only one hop away. Do I need to worry about this? I don’t have to worry about this, it works fine, etc. Many people pointed out that Best Practice is to have a separate network segment/switch for the SQL traffic so it doesn’t compete with  web server traffic. I strongly suspect this best practice arose in the days of 10/100 ethernet, not gigabit ethernet, and that this would be overkill, although if needed it will be easy to retrofit in later since every server has 2 ethernet ports.

Question 5 – RAID And *nix

People tell me that even using Dell-standard RedHat on their own servers with RAID has the problem that there’s no way to find out when a drive has failed, and that the situation is even worse with the BSDs. This could be a major problem. There’s no point in having RAID if you don’t find out when a drive failed. We need to make sure whatever OS we choose for the two Unix boxes supports the Dell Perc Raid Thing.


Q: Why Windows for the web servers?

A: FogBugz is written for Windows and ported to Unix. Thus the Windows version is usually about 2 months ahead of the Unix version, and we always want to be running the latest daily builds of our own dogfood. Also, a lot of Fog Creek’s existing applications such as the online store are written for ASP or ASP.NET.

Q: Why SQL Server instead of MySQL?

A: Similar historical reasons, the performance is better, and the Unicode support is far better.

Q: Don’t you like Linux/Unix/MacOSX/NetBSD/BeOS?

A: Yes. I love all operating systems equally. If we were building all this software from scratch I might have taken a different technical approach, but we didn’t, so get off your OS religion high horse for a minute. We’re going to have a mixed environment for the foreseeable future and 90% of FogBugz customers choose to run it on Windows servers. But we will use Unix for network routing, email, DNS, and a few other small things.

Q: Why Windows 2003 instead of Windows 2000?

A: Because IIS 6.0, the web server in Windows 2003, stores all its configuration information in a XML text-file metabase, and supports “XCOPY deployment,” it means we’ll be able to mirror web servers more trivially simply by mirroring file directories. And Windows 2003 has lots of improvements to web server performance.

Q: Why Dell?

A: My experience has been that it’s easier to get a Dell server built and configured exactly the way you want it, and it usually comes out a lot cheaper than the alternatives. At least 2x cheaper than HP! And I don’t like buying through resellers who don’t know what they’re selling. Every time I ever made the mistake of buying something from PC Connection I sentenced myself to weekly annoying sales calls from a clueless sales person who always dramatically screwed up anything I tried to buy from him. When you call Dell you get a nice, cheerful, competent kid in Austin, Texas who laughs at your jokes, no matter how dumb, and actually has the ability to do sophisticated things like read email that you sent him and then actually reply to it!

But that said, Dell has really, really crappy software skills, and their management software sucks big rocks. Every simple application they make is a horrible mess of Shockwave, HTML, Java, and Visual Basic with 35 MB of runtimes just to do some basic system administration thing. Today when I downloaded their otherwise cool Rack Advisor, the download was an EXE — a self-extracting EXE. Running the EXE extracted — wait for it — another EXE. This EXE was the single file installer. Really, really, really crappy software skills. Get some programmers, Dell! HP and IBM are reputed to have much better system management software, not that I’ve ever seen it.

Finally, Dell has a neat feature where you can log on to their website and see a list of every computer you’ve ever bought from them, and I love logging on and seeing that list get longer and longer.

Q: Why not a SAN?

A: There seem to be lots of good reasons to use a storage area network, although I don’t think it’s quite cost-effective for our situation. We’re looking at a total bill in the neighborhood of $20K, and SAN solutions tend to start around that price.



It’s time for a server upgrade. For two years a single server has hosted the bulk of our sites, including Joel on Software, although a second smaller server has made an appearance. Over the next few weeks there will be a massive upgrade, giving us six top of the line, state of the art servers at the Peer 1 colo downtown.

In the past I would have upgraded the system, installed it all, and written an article about it. And then people would have emailed me to suggest better ways to do things, but it would be too late.

So now for the first time ever, I’m going to publish a “live” article here. So far, I haven’t done anything. If you have any better suggestions for how to do things, it’s not too late for me to learn from your experience. As we go along, I might have a few questions for my readers who have experience with this stuff.

Colo Expansion Part 1