Javascript profanity match NOT replace

2019-03-04 18:03发布

问题:

I am building a very basic profanity filter that I only want to apply on some fields on my application (fullName, userDescription) on the serverside.

Does anyone have experience with a profanity filter in production? I only want it to:

'ass hello' <- match
'asster' <- NOT match

Below is my current code but it returns true and false on in succession for some reason.

var badWords = [ 'ass', 'whore', 'slut' ]
  , check = new Regexp(badWords.join('|'), 'gi');

function filterString(string) {
  return check.test(string);
}

filterString('ass'); // Returns true / false in succession.

How can I fix this "in succession" bug?

回答1:

The test method sets the lastIndex property of the regex to the current matched position, so that further invocations will match further occurrences (if there were any).

check.lastIndex // 0 (init)
filterString('ass'); // true
check.lastIndex // 3
filterString('ass'); // false
check.lastIndex // now 0 again

So, you will need to reset it manually in your filterString function if you don't recreate the RegExp each time:

function filterString(string) {
    check.lastIndex = 0;
    return check.test(string);
}

Btw, to match only full words (like "ass", but not "asster"), you should wrap your matches in word boundaries like WTK suggested, i.e.

var check = new Regexp("\\b(?:"+badWords.join('|')+")\\b", 'gi');


回答2:

You are matching via a substring comparison. Your Regex needs to be modified to match for whole words instead



回答3:

How about with fixed regexp:

check = new Regexp('(^|\b)'+badWords.join('|')+'($|\b)', 'gi');

check.test('ass') // true
check.test('suckass') // false
check.test('mass of whore') // true
check.test('massive') // false
check.test('slut is massive') // true

I'm using \b match here to match for word boundry (and start or end of whole string).