What is a good complete regular expression or some other process that would take the title:
How do you change a title to be part of the URL like Stack Overflow?
and turn it into
how-do-you-change-a-title-to-be-part-of-the-url-like-stack-overflow
that is used in the SEO-friendly URLs on Stack Overflow?
The development environment I am using is Ruby on Rails, but if there are some other platform-specific solutions (.NET, PHP, Django), I would love to see those too.
I am sure I (or another reader) will come across the same problem on a different platform down the line.
I am using custom routes, and I mainly want to know how to alter the string to all special characters are removed, it's all lowercase, and all whitespace is replaced.
No, no, no. You are all so very wrong. Except for the diacritics-fu stuff, you're getting there, but what about Asian characters (shame on Ruby developers for not considering their nihonjin brethren).
Firefox and Safari both display non-ASCII characters in the URL, and frankly they look great. It is nice to support links like 'http://somewhere.com/news/read/お前たちはアホじゃないかい'.
So here's some PHP code that'll do it, but I just wrote it and haven't stress tested it.
Example:
Outputs: コリン-and-トーマス-and-アーノルド
The '-and-' is because &'s get changed to '-and-'.
You can also use this JavaScript function for in-form generation of the slug's (this one is based on/copied from Django):
Assuming that your model class has a title attribute, you can simply override the to_param method within the model, like this:
This Railscast episode has all the details. You can also ensure that the title only contains valid characters using this:
I am not familiar with Ruby on Rails, but the following is (untested) PHP code. You can probably translate this very quickly to Ruby on Rails if you find it useful.
I hope this helps.
I know it's very old question but since most of the browsers now support unicode urls I found a great solution in XRegex that converts everything except letters (in all languages to '-').
That can be done in several programming languages.
The pattern is
\\p{^L}+
and then you just need to use it to replace all non letters to '-'.Working example in node.js with xregex module.
I liked the way this is done without using regular expressions, so I ported it to PHP. I just added a function called
is_between
to check characters: