Combating comment spam with a CAPTCHA
When I first set up Movable Type roughly one year ago, I started getting a bit of comment spam. It wasn’t a huge problem for me, but for big blogs, it’s a major issue. Consequently, there’s been quite of bit of discussion and a number of approaches advocated to solve the weblog comment spam problem.
The simplest and most straightforward solution is not to allow comments on entries or to close them after a certain amount of time. Until today, I opted for the latter approach via David Raynes’ MTCloseComments, primarily because of its simplicity. After sixty days, comments were automatically closed. It worked, and I didn’t get comment spam, but I was uncomfortable with closed comments on >90% of my entries.
If you want a smarter, sexier approach than closing comments on old entries, you’ve got several choices:
- A blacklist, as in Jay Allen’s MT-Blacklist, preventing comments from known spamming hostnames.
- A registration scheme, such as Movable Type’s TypeKey, which essentially establishes a whitelist. I don’t want to require people to go through a registration process just to comment here, however.
- A security test, such as a CAPTCHA, which ensures that a human, not a spambot, is doing the commenting. Here, we’ve got James Seng’s SCode plugin.
- An automated filter for spam, such as James Seng’s MT-Bayesian. Unfortunately, it has some problems, and spammers have learned how to circumvent the Bayesian technique. That’s why you get the weird e-mails with sentences like “Tried fish while cable sought dandelion. She capitulates into flaw fairness” and then a spam link.
Most Movable Type bloggers have opted for the blacklist approach, but I think it goes about combating spam the wrong way. Specifically, I think the very idea of a blacklist is poor in concept, and in application, multiple problems can crop up.
Spam is produced by malicious computer programs, not by humans. Spam, whether of the comment spam or the e-mail variety, is never sent manually by a person. It’s always done by a bot. Therefore, the primary goal of a prevention scheme should be to separate humans from programs, not bad content or comment sources from good ones. This is the problem with a blacklist: It concentrates not on who is making the comment, but where it is coming from.
In practical application, a blacklist is also problematic. Relying on a blacklist stored at a central server can fail when:
- The blacklist is unavailable, e.g. because of a site outage.
- A spammer is not on the blacklist (”false negative”).
- A legitimate comment is prevented because the commenter’s hostname matches a spammer’s hostname (”false positive”).
For these reasons, I believe that a solution that separates humans from programs is superior to a blacklist. The classic technique to accomplish this objective is a CAPTCHA, which stands for “completely automated public Turing test to tell computers and humans apart.” You’ve seen these in large, commercial sites’ registration schemes, where you’re asked to type in a series of letters and/or numbers from a graphic to verify that you are a person, not a program.
I’ve now implemented a CAPTCHA here on the jotsheet, where you’ll be required to enter a string of numbers on the Comment Preview page in order to post the comment. I’m using the excellent SCode plugin, which has gotten a lot less attention than MT-Blacklist (it’s not even mentioned in this high-profile article) but is just as useful and avoids the above-mentioned plagues of blacklists in general. The sole problem with image-based CAPTCHAs is that visually impaired users cannot complete them. I feel this is an acceptable drawback, at least for this website, because commenting is not the central purpose of the jotsheet (the content is still accessible in plain text to a screen reader) and the visually impaired user can always contact me via e-mail.
UPDATE 10/28/04: I received my first piece of comment spam after implementing the SCode plugin. As I don’t think a bot can’t satisfy this CAPTCHA test, I can only surmise that spammers must be tricking (or luring) humans to complete image-based tests, e.g. by requiring them to complete the CAPTCHA to see pornographic pictures. Do you think someone typed in a 6-digit number from my site to see some nudies?
UPDATE 10/28/04: Here’s an e-mail exchange with SCode’s author:
E-mail to James Seng:
Today I received my first piece of comment spam *after* implementing your plugin. I’m aware, of course, that CAPTCHAs are not a foolproof means of eliminating spam and that spammers can trick users (such as browsers of pornographic pictures) to complete CAPTCHAs for them. This seems somewhat unlikely, though, since blog comment spam doesn’t seem like a high-return exercise. Added to that, my blog isn’t exactly Top 100.
Reply from James Seng:
I get one spam every couple of days from spammers who do so manually.
All I can say is: The life of the “manual spammer” must be an unfulfilling one.
December 8th, 2004 at 10:01 pm
Tom, thanks for the hints on using a CAPTCHA. I’ll give it a whirl at Knowledge Jolt and see where it leads me. Maybe those silly spammers will just go away. And those that do it manually, well, they need a life.
December 11th, 2004 at 12:23 am
Spam Begone!
MT-Bayesian won’t work (I know James has pretty much suggested people don’t use, but not because it doesn’t work), and…
December 27th, 2004 at 2:16 pm
The CAPTCHA used by SCode is very easy to decode for a computer; no wonder you are getting spammed.
December 27th, 2004 at 3:06 pm
Still, the MT spammers go for the low-hanging fruit, and I’m not sure anyone is going to the trouble of decoding this CAPTCHA.
February 9th, 2005 at 11:15 am
The graphical captcha may eliminate bots and such like, but it also eliminates humans with less than perfect visual recognition skills. I have struggled with some of the more elaborate Captchas, taking several attempts to provide the correct number. Let’s see if this comment makes it…
February 9th, 2005 at 1:46 pm
Andy,
This is a common (and overdone, IMO) criticiscm of CAPTCHA technology. Jay Allen has been beating this dead horse for years now; do a Google search for “jay allen captcha” and you’ll see what I mean.
On small sites like this, I don’t see why the 0.1% of users who can’t make a comment using MT’s comment system can’t e-mail me their comment. That’s the direction I give in the CAPTCHA image’s ALT text.
To conclude: The percentage of humans kept out by CAPTCHA tests is probably lower than the percentage of false positives by methods such as MT-Blacklist. So I’m fine with it.
May 13th, 2005 at 4:15 am
Hi Tom (is it Tom? - hard to tell from this page),
Nice article. I just moved from MT3.11 and self hosting on OS X, to MT3.16 and remote hosting under Red Hat, and my Scode no longer works :( I’m blaming MT3.16 at the moment, but still investigating. Found your page while trawling for solutions.
I just had to post here because I am also a firm believer that SCode is far far superior to MT-Blacklist. I have yet to hear of a single non-hypothetical case of a visually impaired reader having trouble with a Captcha, yet I’ve heard dozens of stories of Blacklist false positives.
May 13th, 2005 at 9:22 am
Tim,
It is Tom. :) First thing I’d check is the GD library, although you might have already looked into that. It seems to be the source of most of SCode’s installation problems.
If you aren’t running scode-0.1c, you might want to install it, since it includes a test script to find the most common problems (e.g. lack of GD). Earlier versions didn’t have this.
May 13th, 2005 at 1:00 pm
Thanks for the reply. Strangely enough running scodetest.cgi produces a string of 5 ‘OK’s. It detects GD, GD.pm, PNg support, SCode.pm and the temporary directory. However I still get no images. The number files are being created in the temp directory though, so it doesnt seem to be permissions. Something is wrong though, using my browser to open mt-scode.cgi on the remote site takes me to my host’s default page (possibly indicating a ‘page noto found’ kind of error), but if I open mt-scode.cgi on my self hosted (on OS X) site it works and produces the SCode image.
May 13th, 2005 at 1:22 pm
Quoting from the README in SCode-0.1c:
That might be it. See step 13 of the instructions.
May 13th, 2005 at 1:31 pm
I’ve actually made a little headway. Checking my http error log showed this:
[Fri May 13 18:46:04 2005] [error] [client 81.151.xxx.xxx] Premature end of script headers: mt-scode.cgi[Fri May 13 18:46:04 2005] [error] [client 81.151.xxx.xxx] /usr/bin/perl: relocation error: /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi/auto/GD/GD.so: undefined symbol: gdFontGetGiant
[Fri May 13 18:54:38 2005] [error] [client 81.151.xxx.xxx] gd-png: fatal libpng error: Invalid number of colors in palette
[Fri May 13 18:54:38 2005] [error] [client 81.151.xxx.xxx] gd-png error: setjmp returns error condition
However I then discovered this page. Apparently it may be a problem with the hosting company. I’m pursuing that now :-)
May 13th, 2005 at 1:38 pm
James also makes mention on his page of buggy combos of CPanel and GD. Could that be the issue?
May 13th, 2005 at 1:44 pm
Whoops, I just read the page you linked. :-P Yeah, looks like a clean reinstall of GD would do the trick.
May 15th, 2005 at 7:54 pm
SCode reinstalled, GD issue solved
During the relocation of this site to Register1’s hosting service an issue cropped up with James Seng’s MT-Scode. The scodetest.cgi script was failing due to a missing GD.pm. Register1 were very helpful and had GD.pm installed within a couple of…
May 15th, 2005 at 7:57 pm
Just to let you know, I solved my problem. Essentially the hosting company downgraded GD and used Redhat’s precompiled GD.pm instead of compiling their own. I wrote a note about it. Thanks for your input! :-)
April 4th, 2006 at 4:34 am
Can’t locate GD.pm in @INC (@INC contains: /usr/lib/perl5/5.8.5/i386-linux-thread-multi /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at nmsmap.pl line 4.
BEGIN failed–compilation aborted at nmsmap.pl line 4.
April 5th, 2006 at 9:42 am
hari: you need GD (the perl module) installed on your server. Check the SCode forums.
April 12th, 2006 at 1:30 am
Can’t locate GD.pm in @INC (@INC contains: /usr/lib/perl5/5.8.5/i386-linux-thread-multi /usr/lib/perl5/5.8.5 /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5 /usr/lib/perl5/site_perl/5.8.4 /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2 /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.4/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.5 /usr/lib/perl5/vendor_perl/5.8.4 /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2 /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0 /usr/lib/perl5/vendor_perl .) at nmsmap.pl line 3.
BEGIN failed–compilation aborted at nmsmap.pl line 3.
February 6th, 2007 at 7:47 pm
I am writing this just to see scode in action. Feel free to delete it.
June 25th, 2007 at 6:25 pm
Testing SCode captcha stuff; please delete this comment. (I’m hoping it doesn’t get that far anyway.)
February 12th, 2008 at 8:34 am
sdfgsdfgsdfg
sdfgsdfgsdf
gsdfgsdfgsdfgs
dfgsdfgsdfgsdfgsdf
hgjghjfghjfghj
drthrthdrthdrthdr