psst.. this blog is on hiatus.

TrackBack spam

trackback spamTrackBack spam just got worse. Today is August 6th, 2005, and the spammers just upgraded. TrackBack spamming is a big deal again.

Movable Type users: Install SpamLooup

First things first. If you’re a MT user, it’s high time you installed SpamLookup. This is the best plugin I’ve ever seen for Movable Type; without exagerration, it adds probably 25% extra functionality onto the application. “Plugin” really doesn’t do SpamLookup justice; it’s more of an upgrade.

With SpamLookup, you can effectively fight the New School TrackBack spammers. It provides a wordlist filter, IP/hostname blacklist lookups, proxy checking, hyperlink limits, and IP block matching. This is your swiss army knife for fighting TB spam. Note: You’ll need a recent version of Movable Type to have TrackBacks moderated, otherwise they can only be blocked.

TrackBack spam: A tougher problem than comment spam

Like a bad trucker hat, comment spamming used to be “in” thing. Thankfully, neither seems to be in style anymore. Comments are made by humans, and we have reasonably good ways of ensuring that humans are making comments. Examples are CAPTCHAs (which this site uses), passphrases, and registration systems (e.g. TypeKey). In addition, the widespread adoption of rel="nofollow" has made comment spam a generally less profitable enterprise for spammers. Perhaps it’s just my bias, but comment spamming just ain’t what it used to be.

TrackBack spamming, on the other hand, was largely neglected for a long time by the spammers. This always puzzled me. The TrackBack CGI URL is always right there in the HTML—often in not one but two locations. It’s often in plain sight for copy-and-paste convenience (or accessible via a popup), but it’s also in the RDF metadata. This is the key to the auto-discoverability component of TrackBack. It’s a central feature of the protocol—blogs should be able to discover the TrackBack location of other blogs in an automated fashion and send them a message.

Of course, if you allow anything to be automated, and if you make it an open and unprotected standard, you’re asking for abuse. (Witness, Lord help us, SMTP.) This is the state of TrackBack. We cannot apply the typical human tests of CAPTCHAs, passphrases, and registration systems to TrackBack, lest we lose the core auto-discoverability feature. We would be eliminating much of the point of the entire system.

What is this “TrackBack Spam 2.0″ thing?

I’m using the phrase “TrackBack Spam 2.0″ in this blog entry to refer to the new wave of TrackBack spam that’s just hit within the past couple of weeks. This stuff is smarter and faster, but it advertises the same old shit. Previously, TrackBack spam operated on a delay. I am not intimately familiar with how spammers gather data for spamming, but I do know that their turnaround time was a couple of days. That is, if your TrackBack script URL was monday.cgi on Monday, and you changed it to thursday.cgi on Thursday, you would likely get spam hits on Friday asking for monday.cgi.

This meant that bloggers could run and hide from spammers. We renamed our TrackBack scripts. I created a simple Perl script to rename the TB script file and change the setting in mt.cfg. This worked for months and months and I didn’t receive a single spam. Then someone got smart.

My working theory is that they’ve built a list of blogs, perhaps from one of the blog services (Technorati?) and are using this with a real-time spidering/spamming script. The script finds a blog, auto-discovers the TrackBack URL from the RDF metadata, and sends a ping to a randomly numbered entry (usually something old—these have higher PageRank and are less likely to be noticed and deleted).

Rethinking TrackBack usage

I’ve thought about TrackBack spam quite a bit, and in my opinion, many folks don’t use the technology judiciously. I recommend limiting your TrackBack usage (and corresponding limiting your spam vulnerability). This is not a capitulating statement in the vein of “the terrorists have won.” No, the spammers have not won; rather, I think it’s prudent to give them fewer opportunities to play.

  1. If no one tracks you back, don’t use the feature. Do you have a blog with 300 entries and one TrackBack received? If so, consider turning TrackBacks off completely. They seem to be pointless for you.
  2. Not everything needs to be TrackBack-able. I bet if you write about your best bowling score ever, no one will ping the entry. I mean, we’re talking 0.01% chance here. Disable pings.
  3. If you’re willing to sacrifice auto-discoverability (and that’s a big if), you can probably avoid the TB spam altogether. Use the MTDisguiseTrackbackURL plugin, which replaces output of an entry’s TB URL with a JavaScript document.write, and the spammers will have a harder time getting it. (I’m not saying it’s impossible.) Then delete the <$MTEntryTrackbackData$> tag from your Individual Entry Archive template.

TrackBack spam in the long view

TB spam is a tough cookie. It’s harder to combat than comment spam and has similar repercussions. It’s easier to tackle than referrer spam, but referrer spam is more of a nuisance than a deal-breaker.

Newer versions of MT provide a better interface for moderating and deleting TrackBack spam. We’ve also learned more about it, and the Joe Blow blogger knows a few tricks these days he didn’t know before. Frankly, I think it was a miscalculation of spammers not to target the TrackBack protocol earlier. It was a gaping hole, harder to patch than commenting (although less prevalent, since many weblogs don’t offer TB functionality). I believe the true window of opportunity for the spammers has passed.

I’m also happy that Brad Choate and company have put in the hard work to tackle this annoyance with SpamLookup. It’s great. As for TrackBack spam with other blog tools (WordPress, pMachine, etc.): I’m all ears. Post a comment.