For a website that takes input from kids we need to filter any naughty / bad words that they use when they enter their comments in the website (running PHP).
The comments are a free field and users can enter whatever comments they want. The solution I can think of is to have a words list like
BLACKLIST: bad,bad,word,woord,craap,craaaap, (We can fill this with all the blacklisted words).
Then when the form is saved we can look at the list and if any of the words are present then we will not allow the comment to be saved.
BUT the prolem with this method is that they can get around by adding letters to the words to make it skip the filter EG: shiiiiit
Let me know what you think is the best way to create some filter for these words.
You're never going to be able to filter every permutation. Perhaps the most feasible solution is to filter the obvious, and implement a "Report Abuse" mechanism so someone can manually look over (and reject) suspect comments.
SO you are going to ban shit, shït, shıt, śhit, and śhiŧ?
Blacklisting is not a viable solution in the Unicode age. Yet banning € outright seems excessive.
If you have enough time, it is worthwhile reading about the Scunthorpe problem.
Jeff Atwood also has a post on the futility of obscenity filters.
Thanks to too much php I've found some links which might be a solution for your case:
- http://wiki.cdyne.com/wiki/index.php?title=Profanity_Filter
- http://www.webpurify.com/
Use uClassify to train bad comments, when the system is trained well enough you can flag the offending comments for moderation.
Also there is always the possibility to filter word like "bass" which of course includes one of the words which is not permitted. At the moment some good moderators seem like the best solution to such a problem.