Short guide: Clean URLs

Hi!
I’ve been doing some researching on mod rewriting lately, I’ve learned quite something about it, so I’ve decided to share some of that with you. Comments and critiques welcome, haha.
Anyway, most people with PHP websites use GET information to then include appropriate content. But the problem with that is that

a) It’s not really search engine friendly, most of them stop crawling when they see a query string
b) It’s ugly.
c) Well, nothing, but I just felt like I had to have c as well.

So, that’s why it’s better to have clean URLs. That’s static URLs but then via mod rewriting, you still trick them into getting data for generating dynamic content. So with mod rewriting you can switch from URLs like:
yoursite.com/index.php?category=cooking&page=pasta&section=4
To something like:
yoursite.com/cooking/pasta/4

But the best thing is, you’ll still receive the GET information like you did before. So how do we go about doing that? First, make sure that you have access to per-directory .htaccess. Ask your host if you’re not sure.
Next, create a file called .htaccess in the directory where your site is. We’ll assume that’s the root (/) directory. That file should contain the following text:


# this is the initialization
Options 	+FollowSymLinks
RewriteEngine 	On
RewriteBase 	/
# these are the rewrite conditions
RewriteCond 	%{REQUEST_FILENAME} 	!-f
RewriteCond 	%{REQUEST_FILENAME} 	!-d
# and finally, the rewrite rules
RewriteRule 	^([a-zA-Z0-9\-]+)/?$	/index.php?category=$1 [L,QSA]
RewriteRule 	^([a-zA-Z0-9\-]+)/([a-zA-Z0-9\-]+)/?$	/index.php?category=$1&page=$2 [L,QSA]
RewriteRule 	^([a-zA-Z0-9\-]+)/([a-zA-Z0-9\-]+)/([a-zA-Z0-9\-]+)/?$	/index.php?category=$1&page=$2&section=$3 [L,QSA]

Let’s stop here and look at what the above does. First are a few commands that set things up for us. They tell the server to follo symlinks, turn on the rewrite engine and set the rewrite base to /
If you have your site in one directory, say /mysite/ but want it to be accessed without that, like it was in the root, set the base to /mysite/

Then are the rewrite conditions, that’s where we tell the server that if the URL someone has tried to access is an actual file or a directory, don’t interpret it via the rules below but just give it to the client. That’s good because otherwise we’d have some problems with links, images etc.

OK, so now come the rewrite rule. A rewrite rule is comprised of a regular expression (the request is matched against it) and if it matches, it gives the client the file specified afterwards. Now I can’t go about explaining regular expressions as a whole, that’s a pretty hefty subject, but we’ll take a glance at what’s done here. If we take a look at the first rewrite rule, it tells the server to match any character from a-z, A-Z, 0-9 and the dash ‘-’ character. The + means it matches either 1 or many of those characters. Then there’s a dash ‘/’ and a question mark. A question mark after the dash means ‘match either one or no dash at all’. The $ means the end of the line.

Whatever we put in the parenthesis, we can use as a variable. After the regular expression, we reach that variable by using $1. Or $2 if it’s the second variable, etc.
As you can now probably see, the first rewrite rule turns, say, yoursite.com/cooking or (yoursite.com/cooking/) into yoursite.com/index.php?category=cooking. Pretty, sweet, huh?

The second and third rule do the same thing, except they come to effect if someone requests two ‘directories’, like yoursite.com/cooking/pasta. Then it gives him yoursite.com/index.php?section=cooking&page=pasta

In the example I’ve included the ability to read three ‘fake directories’ (so you can do yoursite.com/one/two/three), but if you need more, just add another rewrite rule, I’m sure you can see what the pattern used up there is.

The QSA in the [L,QSA] means that you can append query strings to the URL. So yoursite.com/cooking/pasta/4/?show=something will work.

One final note, if you have problems with links and images becoming broken after using this (although they shouldn’t because of the rewrite conditions, but still) you should put a base tag in the head of your (X)HTML document, like so.

<base href="http://www.yoursite.com/" />

The base tag tells the browser where to begin looking when relative paths are in play.

If you have any questions about this, ask them here, I’ll try to answer them, although I’m no pro at this, so don’t get your hopes up too high. :smiley:

Good luck!