Possible Duplicate:
Regular Expression Sanitize (PHP)
I am facing an issue with URLs, I want to be able to convert titles that could contain anything and have them stripped of all special characters so they only have letters and numbers and of course I would like to replace spaces with hyphens.
How would this be done? I've heard a lot about regular expressions (regex) being used...
Here, check out this function:
Update
The solution below has a "SEO friendlier" version:
The rationale for the above functions (which I find way inefficient - the one below is better) is that a service that shall not be named apparently ran spelling checks and keyword recognition on the URLs.
After losing a long time on a customer's paranoias, I found out they were not imagining things after all -- their SEO experts [I am definitely not one] reported that, say, converting "Viaggi Economy Perù" to
viaggi-economy-peru
"behaved better" thanviaggi-economy-per
(the previous "cleaning" removed UTF8 characters; Bogotà became bogot, Medellìn became medelln and so on).There were also some common misspellings that seemed to influence the results, and the only explanation that made sense to me is that our URL were being unpacked, the words singled out, and used to drive God knows what ranking algorithms. And those algorithms apparently had been fed with UTF8-cleaned strings, so that "Perù" became "Peru" instead of "Per". "Per" did not match and sort of took it in the neck.
In order to both keep UTF8 characters and replace some misspellings, the faster function below became the more accurate (?) function above.
$dict
needs to be hand tailored, of course.Previous answer
A simple approach:
Note that you might have to first
urldecode()
the URL, since %20 and + both are actually spaces - I mean, if you have "Never%20gonna%20give%20you%20up" you want it to become Never-gonna-give-you-up, not Never20gonna20give20you20up . You might not need it, but I thought I'd mention the possibility.So the finished function along with test cases:
To handle UTF-8 I used a
cleanString
implementation found here. It could be simplified and wrapped inside the function here for performance.The function above also implements converting to lowercase - but that's a taste. The code to do so has been commented out.
Easy peasy:
Usage:
Will output:
abcdef-g
Edit: