9 backreferences, 1 Perl script, and 301 redirection
My recent move from underscores to hyphens in filenames has caused me a couple of unexpected headaches, but it’s also taught me some interesting things about Apache’s mod_rewrite. I’ve been buried in .htaccess for hours lately, playing detective to strange behavior on my site. My most recent discovery involves the limitations of Apache’s implementation of regular expressions in mod_rewrite. I think the developers were trying to prevent wannabes like me from screwing up their own sites and consuming server resources, but it’s caused me some frustration.
In moving from underscores to hyphens, I needed to set up redirection from the older URLs to the new. It came down to this:
- Old URL
- http://underscorebleach.net/content/jotsheet/2004/05/this_is_a_page
- New URL
- http://underscorebleach.net/jotsheet/2004/05/this-is-a-page
Unfortunately, to accomplish this redirection in .htaccess is pretty hackish when you’ve got long filenames. You end up with a bunch of similar RewriteRule’s, differing only in the number of underscore-delimited parts in the filename. Observe:
RewriteRule ^content/jotsheet/(older/)?([0-9]{4})/([0-9]{2})/([^_]+)_([^_]+)_([^_]+)$ /jotsheet/$1$2/$3/$4-$5-$6 [L,R=301]
RewriteRule ^content/jotsheet/(older/)?([0-9]{4})/([0-9]{2})/([^_]+)_([^_]+)_([^_]+)_([^_]+)$ /jotsheet/$1$2/$3/$4-$5-$6-$7 [L,R=301]
I know. It’s ugly. And I had rules that were a lot longer, since some of my pages have long filenames. But here’s the problem: mod_rewrite has a limit of nine backreferences. I was using three backreferences in the above RewriteRule’s to grab onto prefacing parts of the URL, leaving me with a maximum of six backreferences for the filename component of the URL. What that means is that on backreference #10, Apache simple returned “0″ instead of the part of the filename it should have saved. Thus, all of my URLs with long filenames were being garbled, resulting in the following situation:
- Properly redirected
- http://underscorebleach.net/content/jotsheet/2004/05/one_two_three_four_five_six → http://underscorebleach.net/jotsheet/2004/05/one-two-three-four-five-six
- Emasculated by mod_rewrite
- http://underscorebleach.net/content/jotsheet/2004/05/one_two_three_four_five_six_seven → http://underscorebleach.net/jotsheet/2004/05/one-two-three-four-five-six-0
What’s the solution here? Don’t do character replacement in long URLs via mod_rewrite’s regular expressions! Instead, use a Perl script.
underscores-to-hyphens.pl
You want Perl? I’ll give it to you, baby, in all the glorious, hackish grandeur in which I know it well. Here’s underscores-to-hyphens.pl (note .txt extension), which very simply takes the requested URL, changes underscores to hyphens, and removes “/content” (another aspect of the URL migration I performed a few days ago). Obviously, I doubt you’ll be doing this type of exact string manipulation on your URLs, but it’s easy to modify that script. Oh, and to call it in .htaccess, use this RewriteRule:
Now then, maybe next time I decide to move and rename 1500 files, I’ll first think about the ramifications.
April 21st, 2006 at 12:45 pm
good hint, i’m gonna try this
May 27th, 2006 at 9:42 am
If I wanted to perform a redirect with htaccess to a perl script, how would I perform this, would an example like this suffice:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://.*example.com:80/*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://.*example.com/.*$ [NC]
RewriteCond %{HTTP_REFERER} !^http://.*example.com*$ [NC]
RewriteRule .*\.(jpg|JPG)$ /path/to/perlscript.pl [T=application/x-httpd-cgi,L]
May 27th, 2006 at 10:45 am
There are some edits you need to make to that, and you can simplify it. Also, the way you wrote it, the RewriteCond’s applied only to the homepage of example.com, which I doubt you meant.
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://([a-z0-9\-]+\.)?example\.com [NC]
RewriteRule \.(jpg|JPG)$ /path/to/perlscript.pl [T=application/x-httpd-cgi,L]
That will apply only to JPEGs. you might want to make it:
RewriteRule \.(png|PNG|jpe?g|JPE?G|gif|GIF)$ /path/to/perlscript.pl [T=application/x-httpd-cgi,L]
August 9th, 2006 at 3:28 am
wow this is so cool.. finally I’ll be able to fix my scripts.
Thanks a million
June 19th, 2008 at 4:20 am
good article
August 17th, 2008 at 11:21 am
This is definitely a good guide. Thanks for this.