News – Joel on Software

Incrementalists versus Completionists

Rands: “Completionists are dreamers. They have a very good idea of how to solve a given problem and that answer is SOLVE IT RIGHT. Their mantra is, ‘If you’re going to spend the time to solve a problem, solve it in a manner that you aren’t going to be solving it AGAIN in three months.’”

We Won’t Be Fooled Again

Dan Appleman: “Sure, if you’re writing software with a lifespan of a few years, Windows Forms is a great way to go. But we all know that software, enterprise software especially, lives a long time. Can Microsoft categorically promise to maintain a full commitment to development, maintenance and support of Windows Forms for the next 15 years?”

Discussion Group Software

We’ve been quietly making some improvements to the beta discussion group software.

Today we rolled out Brett’s new full-text search feature. It relies on the database engine to provide full-text search, and we’re running Microsoft SQL Server, which has rather poor full-text search capabilities: for example, it requires a manual process to rebuild the index, which we schedule for every 15 minutes, so it won’t find anything posted in the last few minutes.

I also added an RSS feed. Originally I wanted to provide full text of all topics and replies in the last three days so that you could use an RSS reader to read the discussion group. Unfortunately that would have resulted in a huge download, and since RSS readers bang on the site every hour or three, our bandwidth usage would have been absurd. So I had to settle for full text of the original topic but not of replies.

And finally we got Summer Intern Ben’s excellent Bayesian filtering code working… due to a couple of configuration problems it wasn’t running right. The idea is to delete comment spam before anyone sees it. It’s hard to tell if the filter works yet because it needs more training, but so far it’s doing pretty well. If you think comment spam is not a big problem, you haven’t moderated a discussion group lately… this is the number one priority for spammers these days, since email filters are starting to work pretty well and spamming a lot of discussion groups is perceived as a good way to trick Google into giving a site prominent placement.

We admit to three strategies to prevent comment spam:

Bayesian filtering which can be trained to remove comment spam instantly
Not allowing new comments on old posts, so that comment spam can’t be hidden in posts which nobody but Google visits any more
Using a META tag to ask search engines not to follow URLs from discussion topics. Although this technique prevents comment spam from working it doesn’t prevent it from happening because spammers don’t seem to particularly care if a given spam works or not.

By “we admit to” I imply that there are other things we do which we don’t talk about too much because revealing them would make it that much easier for spammers to work around them, thus reducing the cost of spamming, thus making it more economically feasible.

One side affect of the Bayesian filter is that if it finds a suspicious topic, rather than letting it through, it will flag it for a human moderator. The moderator can then allow it to be posted (which trains the filter) or leave it unshown. The effect of this is that rarely, new posts won’t appear until a human approves them. This should happen less and less as the filter learns more.

About the author.