April 25, 2007
.htaccess and mod_rewrite tutorial
Note: To make this easier for you, here are direct links for Part 1 and Part 3 of this brief .htaccess tutorial.
In this section we'll review some of the tools and syntax we can use to create instructions the server will process for us. First, some important syntax items. You'll use these over and over.
The syntax used in mod_rewrite is what is known as Regular Expression, or Regex in geek-speak. Regular Expressions is nothing more than bits of text characters, some of which have a special meaning for the server. Those with a special meaning are basically pre-defined terms.
When you're talking about Regex and .htaccess most of these special pre-defined characters/terms will be the same globally. However there are a few of them that can have a (often totally) different meaning depending upon how and where they're used.
Here are the ones you'll use the most.
RewriteCond - This tells the server to parse through and interpret the conditions you set forth, and to apply the RewriteRule that follows it if the test is proven true. If you use a RewriteCond there must always be a RewriteRule tied to it.
Example: RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule - A RewriteRule is simply the instruction you want to server to interpret and carry out. A RewriteRule doesn't not require a preceeding RewriteCond, but a RewriteCond does require a RewriteRule.
Example: RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]
While we're at a nice stopping place let me tell you right now that there are often many, many ways to construct rewrite rules that in many cases will do the same or similar things. Which way to choose is almost always up to the individual situation you're confronted with. As an example, let say you're trying to control the old www/non-www canonical issue so that the search engines can only find one version of each of your pages.
One way to accomplish this task is:
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} ^domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]
Another way to accomplish the task would be:
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.domain\.com [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]
What's the difference?
The RewriteCond in the first example says (in English) If the HTTP_HOST variable begins with "domain.com" no matter what the case, the RewriteCond tests true.
Whereas the RewriteCond in the second example says If the HTTP_HOST variable does not begin with "www.domain.com" no matter what the case the RewriteCond tests true.
So on the average site they would do the exact same thing.
But there are situations where one might be preferrable to the other.
For instance, if you had additional domains parked or aliased on top of your main domain the second instance would be a better choice. Because then even if someone arrived by going to www.someotherdomain.com it would still test true and trigger the redirect.
But by the same token if your site had subdomains (eg shop.domain.com) that were hosted in the same space as your main domain you'd want to use the first example. Because the second one would fire off when you didn't want to, sending traffic to your shop back to the www version of your domain.
Okay, enough of that. Let's cover some more common syntax.
[ ] (Square Brackets) - These are used around an expression or part of an expression. Basically it's a way to define a character or several characters that you would like to match against.
Example: RewriteRule ^store/([a-zA-z]+) /store/index.php?prodid=$1
letter-letter - This matches any single lowercase letter that matches the first or last character given, plus any letter between then.
Example 1: RewriteRule ^store/[a-z] /store/index.php?prodid=$1
The above would match any request for a url that reads like www.domain.com/store/a through www.domain.com/z Note that with this syntax it would not match www.domain.com/aa because there are two a's in the string and it is constructed to match only one a. There are ways to expand this that we'll cover in a bit.
Example 2: RewriteRule ^store/[c-g] /store/index.php?prodid=$1
The above would match any request for www.domain.com/c through www.domain.com/g however it would not match www.domain.com/b since "b" is not included in the expression.
LETTER-LETTER - This is the same as above, only it will match against capital letters instead of lowercase letters.
Example: RewriteRule ^store/[A-Z] /store/index.php?prodid=$1
number-number - This works on the same principle as the letter-letter and LETTER-LETTER do. It will match any single number in the range you specify.
Example 1: RewriteRule ^store/[0-9] /store/index.php?prodid=$1
The above will match www.domain.com/store/0 through www.domain.com/9 It will not match www.domain.com/13
Example: 2: RewriteRule ^store/[3-6] /store/index.php?prodid=$1
The above will match www.domain.com/3 through www.domain.com/6 however will not match www.domain.com/8
Characater List - This will match any single character from those you provide.
Example: RewriteRule ^store/[donkey678] /store/index.php?prodid=$1
An important note for you to wrap your head around here... The above would match if any single character that makes up the string d, o, n, k, e, y, 6, 7, 8 is in the correct place in the request. So it would match for www.domain.com/store/d and it would match for www.domain.com/k and it would match for www.domain.com/8 It would however not match if the request was for www.domain.com/store/donkey678
The important thing to remember is when your Character List Expression is enclosed in square brackets the server does not see Words in the way we mere humans do. The server simply sees a list of characters, each of which is an entity unto itself.
Okay, there's the basics on Expressions and how to include them inside square brackets. Next we'll start covering some of the most important pre-defined characters you'll be using most often.
Note: To make this easier for you, here are direct links for Part 1 and Part 3 of this brief .htaccess tutorial.