We need to generate a unique URL from the title of a book - where the title can contain any character. How can we search-replace all the 'invalid' characters so that a valid and neat lookoing URL is generated?
For instance:
"The Great Book of PHP"
www.mysite.com/book/12345/the-great-book-of-php
"The Greatest !@#$ Book of PHP"
www.mysite.com/book/12345/the-greatest-book-of-php
"Funny title "
www.mysite.com/book/12345/funny-title
You can use a simple regular expression for this purpose:
Use a regex replace to remove all non word characters. For example:
Sanitizing special characters not an easy task imho. Take a look at WordPress awesome sanitize_title function, also look it's source.
Update: Sorry guys, i should downvote every answer which isn't dealing with accented characters. Do you understand what "the title can contain any character" means?
Update 2: Go, guys, go! Please downvote me as many as you can!
Note: and please don't get surprised when you meet a special character. Just eliminate it with str_replace!
This code comes from CodeIgniter's url helper. It should do the trick.
If “invalid” means non-alphanumeric, you can do this:
This will turn
$str
into lowercase, replace any sequence of one or more non-alphanumeric characters by one hyphen, and then remove leading and trailing hyphens.If you want to allow only letters, digits and underscore (usual word characters) you can do:
It first replaces any non-word character(
\W
) with a-
.Next it replaces any consecutive
-
with a single-
Next it deletes any leading or trailing
-
.Working link