Detecting specific words in a textarea submission

2019-03-05 03:04发布

I have a new feature on my site, where users can submit any text (I stopped all HTML entries) via a textarea. The main problem I still have though is that they could type "http://somewhere.com" which is something I want to stop. I also want to blacklist specific words. This is what I had before:

if (strpos($entry, "http://" or ".com" or ".net" or "www." or ".org" or ".co.uk" or "https://") !== true) {
            die ('Entries cannot contain links!');

However that didn't work, as it stopped users from submitting any text at all. So my question is simple, how can I do it?

3条回答
再贱就再见
2楼-- · 2019-03-05 03:12

This is a job for Regular Expressions.

What you need to do it something like this:

// A list of words you don't allow
$disallowedWords = array(
  'these',
  'words',
  'are',
  'not',
  'allowed'
);
// Search for disallowed words.
// The Regex used here should e.g. match 'are', but not match 'care' or 'stare'
foreach ($disallowedWords as $word) {
  if (preg_match("/\s+$word\s+/i", $entry)) {
    die("The word '$word' is not allowed...");
  }
}

// This variable should contain a regex that will match URLs
// there are thousands out there, take your pick. I have just
// used an arbitrary one I found with Google
$urlRegex = '(http|https|ftp)\://([a-zA-Z0-9\.\-]+(\:[a-zA-Z0-9\.&%\$\-]+)*@)*((25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9])\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[1-9]|0)\.(25[0-5]|2[0-4][0-9]|[0-1]{1}[0-9]{2}|[1-9]{1}[0-9]{1}|[0-9])|localhost|([a-zA-Z0-9\-]+\.)*[a-zA-Z0-9\-]+\.(com|edu|gov|int|mil|net|org|biz|arpa|info|name|pro|aero|coop|museum|[a-zA-Z]{2}))(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\?\'\\\+&%\$#\=~_\-]+))*';

// Search for URLs
if (preg_match($urlRegex, $entry)) {
  die("URLs are not allowed...");
}
查看更多
时光不老,我们不散
3楼-- · 2019-03-05 03:15

You must use strpos more the once. With your way you evaluate the or statement with returns true / false and pass it to strpos.

This way it should work:

if (strpos($entry, "http://") !== false || strpos($entry, "https://") !== false || strpos($entry, ".com") !== false)
查看更多
别忘想泡老子
4楼-- · 2019-03-05 03:31

A simple way to do this is to put all the words not allowed into an array and loop through them to check each one.

$banned = array('http://', '.com', '.net', 'www.', '.org'); // Add more
foreach ($banned as $word):
    if (strpos($entry, $word) !== false) die('Contains banned word');
endforeach;

The problem with this is if you get too carried away and start banning the word 'com' or something, there are other words and phrases that could be perfectly legal that contains the letters 'com' in that way that would cause a false positive. You could use regular expressions to search for strings that look like URLs, but then you can easily just break them up like I did above. There is no effective way to completely stop people from posting links into a comment. If you don't want them there, you'll ultimately just have to use moderation. Community moderation works very well, look at Stack Overflow for instance.

查看更多
登录 后发表回答