Joel on Software
July 17: New York, NY:
Fog Creek Open House
5:00pm 535 8th Ave, 18 Floor

Sep 3-4: Boston:
Business of Software 2008
a JOEL ON SOFTWARE conference
Search:

Wanted: Google Engineering Opportunities at Google Inc. (Mountain View, CA 94043 / New York, NY 10011 / Santa Monica, CA 90401). See this and other great job listings at jobs.joelonsoftware.com.

Announcing FogBugz On Demand


This item ran on the Joel on Software homepage on Monday, July 09, 2007

I’m happy to announce that FogBugz On Demand is now available. This is a professionally-hosted version of FogBugz 5.0, previously only available as a download.

Selling web software has always been a slightly strange aspect of the way Fog Creek operates. Since FogBugz is a web-based project management tool, why should customers have to download it and install it on their own servers?

In the past, we did it that way because we’re a small company. With just a handful of full time employees, we didn’t really have the resources to be a reliable service provider that customers could trust with their mission-critical data.

To prepare for FogBugz On Demand, we’ve done a lot of hard work over the past year.

First, we hired a professional system administrator. He has pored over every inch of our hosting infrastructure, patching and testing and improving reliability everywhere. He upgraded all the NetBSD servers to Linux, installed a bunch of new hardware, and added lots and lots of automated monitoring.

In general, we decided to use high-end components for our hosting architecture: Dell PowerEdge 2950 Servers with SCSI RAID, Windows Server 2003, and SQL Server 2005. Yep, that’s an expensive way to do it. Since FogBugz runs fine on LAMP (Linux / Apache / MySQL / PHP), we could have gotten a bunch of cheap boxes, used all free software, and saved some money in exchange for some level of headaches. Indeed, most hosted services really should be built on LAMP. In our case, though, the cost of those Microsoft licenses and those extremely reliable Dell servers can be spread out over quite a few paying customers, so for us the cost difference per customer is really inconsequential. And we’ve been running IIS and Microsoft SQL Server here for six years without data loss, so that’s what we know and trust. But to be honest, if we ever get to the point where we’re racking up 10 new servers at a time, we’ll almost certainly switch to LAMP. I’m still gonna buy Dell servers and SCSI hard drives, because frankly, the small extra cost over cheapo white boxes is well worth it in reliability.

We made changes to the FogBugz code base itself to make it work better in a multi-hosted environment. The biggest surprise was how much work it took so that every user sees things in their own time zone. We also put a lot of work into the accounting and billing system (FogBugz On Demand is $21 a month, with no commitment). We implemented a database-backed DNS system so that each On Demand customer gets their own domain (you.fogbugz.com).

The biggest change was bringing up a second data center. I can’t tell you how scary it is to be responsible for our customers’ mission-critical data, so I didn’t want to have any single point of failure, no matter how fortified it is.

Our first data center has been with Peer 1 Network in New York’s financial district. Peer 1 is a Canadian backbone provider where we’ve been since the beginning of 2003. To take advantage of their backbone, we put our second data center in their new Los Angeles facility. This new data center is pretty much an exact replica of what we have in New York.

To some extent, by using Peer 1 for our second facility we are, technically, putting all our eggs on one backbone. But it’s a pretty darn reliable backbone and an excellent company. We actually investigated a couple of other colo providers and even went so far as to build out a facility in Chicago (with an unnamed provider). But shortly before we launched, they had a six hour outage, and in the aftermath of that, we discovered that their network connectivity was inadequate and their concept of building reliable systems did not use the same definition of “reliable” as we do. So we gave up on them, shipped all the servers from Chicago to LA, and went with the tried and true Peer 1.

Rather than setting up Los Angeles as a mere backup, we decided it would be completely live. Half our customers will be hosted from Los Angeles, and half from New York. That way we know at any time that both data centers are working and set up correctly, and we don’t have to wait until a massive failure to discover the problems with the backup data center.

Copies of the database backups are maintained in both cities, and each city serves as a warm backup for the other. If the New York data center goes completely south, we’ll wait a while to make sure it’s not coming back up, and then we’ll start changing the DNS records and start bringing up our customers on the warm backup in Los Angeles. It’s not an instantaneous failover, since customers will have to wait for two things: we’ll have to decide that a data center is really gone, not just temporarily offline, and they’ll have to wait up to 15 minutes for the DNS changes to propagate. Still, this is for the once-in-a-lifetime case of an entire data center blowing up, not just for occasional outages: each data center already has incredible backbone connectivity, UPSs, backup diesel generators, and so forth (Peer 1 survived that huge blackout during the summer of 2003 while many of their competitors were winking out).

To implement this warm backup feature, I wrote a SQL mirroring application that implements transaction log shipping: basically, it does an incremental backup in one city, compresses that backup, ships it to the other city, uncompresses it, and applies it to the warm backup database. Right now, we’re log shipping twice a day, so you might lose a day of work if an entire city blew up, but in a couple of weeks, we’ll implement a system that does more continuous backups, and we expect that the warm backups will never get more than 15 minutes behind.

FogBugz On Demand has actually been in beta since April, and in fact we have been hosting FogBugz trials on line since 2000, without ever losing anyone’s data. (The first FogBugz 1.0 trial server, believe it or not, was a Thinkpad laptop with a broken screen plugged into our office T1!) So I’m pretty confident now that our little company can do a pretty good job of hosting FogBugz for you.



My new book is here! Apress has just published a new collection of 36 essays from Joel on Software, aptly named More Joel on Software. Get yours today! Available from Amazon.com or wherever fine cheese is sold.

About the Author: I’m your host, Joel Spolsky, a software developer in New York City. Since 2000, I've been writing about software development, management, business, and the Internet on this site. For my day job, I run Fog Creek Software, makers of FogBugz—the smart bug tracking software with the stupid name, and Fog Creek Copilot—the easiest way to provide remote tech support over the Internet, with nothing to install or configure.

Enter your email address to receive a (very occasional) email whenever I write a major new article. You can unsubscribe at any time, of course.

Email:

 
Home | Email | Bug Tracking Software | Remote Assistance | Complete Archive