Wednesday, August 27, 2008

Using URL Rewriting For More Friendly Links

As part of an application we're developing at work, I recently did some research on whether it was possible to make the links to items, categories, etc., more friendly to search engines. The links we currently have are something like this: /website/productdetails.aspx?productid=123456. Basically, we pass in a product ID to the productdetails.aspx page. However, it would be nicer to have something more like /website/item/123456/rewritten-item-description.aspx.

I did some research and came up with some code that will work, and not be too intrusive to the rest of the code, or require any changes in IIS. First, we'll add a property to the items that returns a link in the correct format, so that we don't have to duplicate that code in the various places we show the link. To convert the item description to something that looks like a filename, I came up with the following method:

private string CreateLink(string prefix, string id, string description)
{
String format = "{0}/{1}/{2}.aspx";
String fixedDescription = "";
fixedDescription = description.Trim().Replace(" ", "-").Replace("&", "and").Replace("/", "-");

Regex nonWord = new Regex(@"[^\w\-]", RegexOptions.IgnoreCase);
fixedDescription = nonWord.Replace(fixedDescription, "");

fixedDescription = DeAccentCharacters(fixedDescription);

return String.Format(format, prefix, id, fixedDescription);
}

This strips out any non-word characters, replaces spaces and slashes with dashes, ampersands with the word "and" and converts accented characters to their non-accented counterparts (that part is done in DeAccentCharacters(), which I won't post, since it just uses a pre-made lookup table, and isn't that interesting). It then prefixes the name with the prefix and id, basically coming up with what I showed above.

Now that we have links in the correct format, we need to interpret them. Note that it looks like we have a couple of directories there ("item" and "123456") that don't actually exist. We need to intercept the request for this page before ASP.NET has a chance to complain about it not existing. To do that, we add some code in the Global.cs class in the App_Code directory. First, in the Application_Start() method, add this, anywhere in the method:

SetupRewriteRules();

This will set up a list of URL rewriting rules, using regular expressions. Then, in Application_BeginRequest(), add this as the first line, before anything else you may be doing:

RewriteURL();

This calls a new method that interprets the rules created in SetupRewriteRules() and rewrites the URL in the request accordingly.

Next, add this class to the file (at the end is preferable):

private class RewriteRule
{
private Regex rule;
private String rewrite;

public RewriteRule(String ruleRegex, String rewriteText)
{
rule = new Regex(ruleRegex, RegexOptions.Compiled | RegexOptions.IgnoreCase);
rewrite = rewriteText;
}

public String Process(String path)
{
Match match = rule.Match(path);

if (match.Success)
{
return rule.Replace(path, rewrite);
}

return string.Empty;
}
}

Next, add a static list to hold these and the SetupRewriteRules() method:

private static List<RewriteRule> rules = new List<RewriteRule>();

private void SetupRewriteRules()
{
rules.Add(new RewriteRule("Item/([^/]*)/(.*).aspx", "/ProductDetails.aspx?productID=$1"));
}

We have more rules, but I'll just show this one. Note that the first parameter is a regular expression that matches "Item/item ID/filename.aspx". The second parameter is what to replace that with. In this case, it takes the first match (the item ID, enclosed in the first set of parentheses), and puts it where the "$1" is in the URL.

Finally, add the RewriteURL() method:

private void RewriteURL()
{
foreach (RewriteRule rule in rules)
{
String subst = rule.Process(HttpContext.Current.Request.Path);
if (subst.Length > 0)
{
HttpContext.Current.RewritePath(subst);
break;
}
}
}

This code iterates through each rule, and if it finds one that matches, it calls RewritePath() using the rewritten URL. This is what actually translates the "friendly" URL into something that works with our application. The great part is that it is totally transparent to the user; they never see the rewritten URL. Postbacks still work fine as well.

I realize there are other ways to do this, like ASP.NET MVC, but our application is pretty much already written, and I'm not wild about going back and redoing it in a totally new technology. This can be retrofitted onto the app without too much pain, and could even be turned off or on with a config file setting if needed.

No comments: