psst.. this blog is on hiatus.

9 backreferences, 1 Perl script, and 301 redirection

My recent move from underscores to hyphens in filenames has caused me a couple of unexpected headaches, but it’s also taught me some interesting things about Apache’s mod_rewrite. I’ve been buried in .htaccess for hours lately, playing detective to strange behavior on my site. My most recent discovery involves the limitations of Apache’s implementation of regular expressions in mod_rewrite. I think the developers were trying to prevent wannabes like me from screwing up their own sites and consuming server resources, but it’s caused me some frustration.

In moving from underscores to hyphens, I needed to set up redirection from the older URLs to the new. It came down to this:

Old URL
http://underscorebleach.net/content/jotsheet/2004/05/this_is_a_page
New URL
http://underscorebleach.net/jotsheet/2004/05/this-is-a-page

Unfortunately, to accomplish this redirection in .htaccess is pretty hackish when you’ve got long filenames. You end up with a bunch of similar RewriteRule’s, differing only in the number of underscore-delimited parts in the filename. Observe:

RewriteRule ^content/jotsheet/(older/)?([0-9]{4})/([0-9]{2})/([^_]+)_([^_]+)$ /jotsheet/$1$2/$3/$4-$5 [L,R=301]
RewriteRule ^content/jotsheet/(older/)?([0-9]{4})/([0-9]{2})/([^_]+)_([^_]+)_([^_]+)$ /jotsheet/$1$2/$3/$4-$5-$6 [L,R=301]
RewriteRule ^content/jotsheet/(older/)?([0-9]{4})/([0-9]{2})/([^_]+)_([^_]+)_([^_]+)_([^_]+)$ /jotsheet/$1$2/$3/$4-$5-$6-$7 [L,R=301]

I know. It’s ugly. And I had rules that were a lot longer, since some of my pages have long filenames. But here’s the problem: mod_rewrite has a limit of nine backreferences. I was using three backreferences in the above RewriteRule’s to grab onto prefacing parts of the URL, leaving me with a maximum of six backreferences for the filename component of the URL. What that means is that on backreference #10, Apache simple returned “0″ instead of the part of the filename it should have saved. Thus, all of my URLs with long filenames were being garbled, resulting in the following situation:

Properly redirected
http://underscorebleach.net/content/jotsheet/2004/05/one_two_three_four_five_six → http://underscorebleach.net/jotsheet/2004/05/one-two-three-four-five-six
Emasculated by mod_rewrite
http://underscorebleach.net/content/jotsheet/2004/05/one_two_three_four_five_six_seven → http://underscorebleach.net/jotsheet/2004/05/one-two-three-four-five-six-0

What’s the solution here? Don’t do character replacement in long URLs via mod_rewrite’s regular expressions! Instead, use a Perl script.

underscores-to-hyphens.pl

You want Perl? I’ll give it to you, baby, in all the glorious, hackish grandeur in which I know it well. Here’s underscores-to-hyphens.pl (note .txt extension), which very simply takes the requested URL, changes underscores to hyphens, and removes “/content” (another aspect of the URL migration I performed a few days ago). Obviously, I doubt you’ll be doing this type of exact string manipulation on your URLs, but it’s easy to modify that script. Oh, and to call it in .htaccess, use this RewriteRule:

RewriteRule ^old/path/to/files /path/to/underscores-to-hyphens.pl [T=application/x-httpd-cgi,L]

Now then, maybe next time I decide to move and rename 1500 files, I’ll first think about the ramifications.

6 Responses to “9 backreferences, 1 Perl script, and 301 redirection”

  1. 1
    sathia Says:

    good hint, i’m gonna try this

  2. 2
    Perlqu5 Says:

    If I wanted to perform a redirect with htaccess to a perl script, how would I perform this, would an example like this suffice:

    RewriteEngine On
    RewriteCond %{HTTP_REFERER} !^http://.*example.com:80/*$ [NC]
    RewriteCond %{HTTP_REFERER} !^http://.*example.com/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^http://.*example.com*$ [NC]
    RewriteRule .*\.(jpg|JPG)$ /path/to/perlscript.pl [T=application/x-httpd-cgi,L]

  3. 3
    tom sherman Says:

    There are some edits you need to make to that, and you can simplify it. Also, the way you wrote it, the RewriteCond’s applied only to the homepage of example.com, which I doubt you meant.

    RewriteEngine On
    RewriteCond %{HTTP_REFERER} !^http://([a-z0-9\-]+\.)?example\.com [NC]
    RewriteRule \.(jpg|JPG)$ /path/to/perlscript.pl [T=application/x-httpd-cgi,L]

    That will apply only to JPEGs. you might want to make it:

    RewriteRule \.(png|PNG|jpe?g|JPE?G|gif|GIF)$ /path/to/perlscript.pl [T=application/x-httpd-cgi,L]

  4. 4
    Mike Says:

    wow this is so cool.. finally I’ll be able to fix my scripts.

    Thanks a million

  5. 5
    vigram Says:

    good article

  6. 6
    SNVC Says:

    This is definitely a good guide. Thanks for this.