I've been using this:
str2 = str1.replace(/[^\w]/gi, '');
It works fine, but falls foul of JSLint for having an insecure '^'
as outlined in the posts here and here.
The consensus is that it is better to use your regex
to specify what is allowed rather than what is not. No one ever demonstrates how to do this, however. I've even got Flanagan and Crockford in front of me here, but to my shame I'm still not sure what to do.
So... how do you set str2
to only allow the \w
characters found in str1
using a positive test rather than a negative one?
Try with \W
(capital W).
The \w
selects word, while \W
selects not word. And looks a bit nicer in the expression.
Here's a RegEx cheatsheet, it comes handy while you're coding!
Your example is too simple to demonstrate the point of not using ^
in regex.
A better example can be: HTML code clean up in a form submit, where you want to allow HTML tags, but don't want people to inject XSS (Cross-Site Scripting) attack. In this case, if you use blacklist approach, you cannot reliably remove all attack codes, since the attacker can alter the syntax to avoid your filter - or adapt the code so that the filtered code will give back the attack code. The correct approach is to use a white-list and list out all the tags allowed, plus allowed attributed. This example may not related to regex - since regex should not be used to parse HTML, but it demonstrate the point about white-list versus black-list approach in filtering.
It depends on what you want to do.
You can either only allow the \w
charset and throw an error when the string contains characters other than those in the \w charset, by doing something like this:
str1='blah blah string';
if(str1.match(/^\w*$/gi)
{
//do something
}
else
{
//alert and/or throw error
}
Or you can accept whatever is being defined as str1 and filter out the characters that you don't want. Which is what you are currently doing. Example:
str1='blah blah some string';
str1=str1.replace(/\W/gi,'');
Note: the above is a shorter version of what you are doing. str2 = str1.replace(/[^\w]/gi, '');