News – Joel on Software

Recruiting

To Gretchen: recruiting successfully isn’t only up to recruiters. The best recruiting department in the world can’t make people want to work at a company that’s moribund, that can’t figure out how to ship a compelling upgrade to their flagship OS, or update their flagship database server more than once every five years, that has added tens of thousands of technical workers who aren’t adding any dollars to the bottom line, and that constantly annoys twenty year veterans by playing Furniture Police games over what office furniture they are and aren’t allowed to have. Summer interns at Fog Creek have better chairs, monitors, and computers than the most senior Microsoft programmers.

Recruiting has to be done at the Bill and Steve level, not at the Gretchen level. No matter how good a recruiter you are, you can’t compensate for working at a company that people don’t want to work for; you can’t compensate for being the target of eight years of fear and loathing from the slashdot community, which very closely overlaps the people you’re trying to recruit, and you can’t compensate for the fact that a company with a market cap of $272 billion just ain’t going to see their stock price go up. MSFT can grow by an entire Google every year and still see less than 7% growth in earnings. You can be the best recruiter in the world and the talent landscape is not going to look very inviting if the executives at your company have spent the last years focusing on cutting benefits, cutting off oxygen supplies, and cutting features from Longhorn.

Network Load Balancing Works

For the first time ever I was able to install today’s round of Microsoft patches on our web servers without bringing the sites down at all. I’m very happy about this, since this was the main point of upgrading the web farm.

We have two web servers, web1.fogcreek.com and web2.fogcreek.com, each with their own IP address, but using a feature built into Windows 2003 called Network Load Balancing, they both share the web site load using a third IP address, which I’ve named webnlb.fogcreek.com. Whenever a request comes in on that shared IP address, it is distributed to one of the web servers at random. If requests come in from the same class C address range, those requests will prefer to go to the same web server that previously served that address range. So for the most part the same user will always go to the same physical machine, if possible, so stateful web applications still work even if the state is maintained on one computer.

I actually like the NLB system a bit more than using a dedicated hardware load balancer. Here’s why: there’s no single point of failure. If you have a hardware load balancer and that needs to be updated or rebooted or if it fails, you’re off the air. Whereas Windows NLB is all-software and each server in the cluster is a peer, so any server can die and the rest of the system stays up.

When I needed to install today’s Windows updates, here’s what I did:

Told WEB1 to drainstop. That means “finish serving any requests you’re working on, but don’t take any new requests.” This took three or four minutes before it flatlined; WEB2 silently picked up the entire load.
Installed the upgrades on WEB1 and rebooted it.
Repeat for WEB2, while WEB1 held up the entire load.

As far as I can tell nobody should have seen a single hiccup in the sites served from the new web farm.

About the author.