I use these lines of code to remove all punctuation marks, symbols, etc as you can see them in the array,
$pattern_page = array("+",",",".","-","'","\"","&","!","?",":",";","#","~","=","/","$","£","^","(",")","_","<",">");
$pg_url = str_replace($pattern_page, ' ', strtolower($pg_url));
but I want to make it simpler as it looks silly to list all the stuff I want to remove in the array as there might be some other special characters I want to remove.
I thought of using the regular expression below,
$pg_url = preg_replace("/\W+/", " ", $pg_url);
but it doesn't remove under-score - _
What is the best way to remove all these stuff? Can regular expression do that?
Use classes:
Would remove anything that's not considered a "character" by the currently set locale. If it's punctuation, you seek to eliminate, the class would be
[:punct:]
.\W
means "any non-word character" and is the opposite of\w
which includes underscores (_
).Depending on how greedy you'd like to be, you could do something like:
This will replace anything that isn't a letter, number or space.