When I run a phrase that contains double quotes through this function, its replacing the quotes with quot.
I want to completely remove them (also single quotes). How can I alter the function to do that?
function string_sanitize($s) {
$result = preg_replace("/[^a-zA-Z0-9]+/", "", $s);
return $result;
}
Update:
Example 1: This is 'the' first example
returns: Thisis030the039firstexample
Errors: Warning: preg_match_all() [function.preg-match-all]: Unknown modifier '0' in C
Example 2: This is my "second" example
returns: Thisismyquotsecondquotexample
Errors: Invalid express in Xpath
I would not call that function
string_sanitize()
, as it is misleading. You could call itstrip_non_alphanumeric()
.Your current function will strip anything that isn't an upper or lowercase letter or a number.
You can strip just
'
and"
with...I think your preg_replace call should be like this:
Please see html_entity_decode reference for more details.
In order to be sure of remove all kind of quotes (including those into which left side are different from the right side ones) I think it must be something like;
Your function uses regular expression to remove any character that different from [a-zA-Z0-9], so it surely removes any "" or ''
EDIT: well, from Hamish answer I realize your string is a HTML string, so that it explain why "(") to be transformed to "quot". You may consider replace
"e
by preg_replace, or htmlspecialchars_decode first.It looks like your original string had the HTML characters for
"
("
) so when you attempt to sanitize it, you're simply remove the&
and;
, leaving the rest of the stringquot
.---EDIT---
Probably the easiest way to remove non alpha numeric characters would be to decode the HTML characters with html_entity_decode, then run it through the regular expression. Since, in this case, you won't get anything that needs to be re-coded, you don't need to then do htmlentities, but it's worth remembering that you had HTML data and you now have raw unencoded data.
Eg:
Note that
ENT_QUOTES
flags the function to "...convert both double and single quotes.".Easy way for both single and double quotes : ) And still leaves something similar to look at.