I am building a very basic profanity filter that I only want to apply on some fields on my application (fullName, userDescription) on the serverside.
Does anyone have experience with a profanity filter in production? I only want it to:
'ass hello' <- match
'asster' <- NOT match
Below is my current code but it returns true and false on in succession for some reason.
var badWords = [ 'ass', 'whore', 'slut' ]
, check = new Regexp(badWords.join('|'), 'gi');
function filterString(string) {
return check.test(string);
}
filterString('ass'); // Returns true / false in succession.
How can I fix this "in succession" bug?
The test
method sets the lastIndex
property of the regex to the current matched position, so that further invocations will match further occurrences (if there were any).
check.lastIndex // 0 (init)
filterString('ass'); // true
check.lastIndex // 3
filterString('ass'); // false
check.lastIndex // now 0 again
So, you will need to reset it manually in your filterString
function if you don't recreate the RegExp each time:
function filterString(string) {
check.lastIndex = 0;
return check.test(string);
}
Btw, to match only full words (like "ass", but not "asster"), you should wrap your matches in word boundaries like WTK suggested, i.e.
var check = new Regexp("\\b(?:"+badWords.join('|')+")\\b", 'gi');
You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead
How about with fixed regexp:
check = new Regexp('(^|\b)'+badWords.join('|')+'($|\b)', 'gi');
check.test('ass') // true
check.test('suckass') // false
check.test('mass of whore') // true
check.test('massive') // false
check.test('slut is massive') // true
I'm using \b
match here to match for word boundry (and start or end of whole string).