Registration spammer detection with akismet

2019-05-10 23:41发布

问题:

I have a large list of users that registered through a website without any spam filter active during registration.

I would like to distinguish which registered users are likely spammers. I'm trying to use akismet to do this but so far akismet is telling me all users are not spammers. Probably since akismet really is made for comments, which aren't available during registration.

What I'm sending akismet is the username, email. For url I use the email domain. For their comment, I use: "Hi, I'm $username from $domain registered on $date with email $email and website $url".

This however, like said, always returns valid users even if the user looks like a spammer.

If you're interested in the full code:

<?php

// bring php process to this dir
chdir(dirname(__FILE__));


// include Joomla Framework
require('../bootstrap-joomla.php');

// akismet class
require('akismet.class.php');

/**
 * Retrieves users not yet validated
 */
function getUsers($userid, $limit = 10) {
  global $database;
  $database->setQuery("SELECT * FROM jos_users WHERE akismet_validated = 0 LIMIT " . intval($limit));
  $Users = $database->loadObjectList();
  return $Users;
}

/**
 * sets the validation results for the user
 */
function saveValidationResult($userid, $spammer) {
  global $database;
  $database->setQuery("UPDATE jos_users set akismet_validated = 1, akismet_spammer = " . intval($spammer) . " WHERE id = " . $userid . " LIMIT 1");
  return $database->query();
}

// get non validated users
$Users = getUsers();

// validate each user
foreach($Users as $User) {
  list($user, $domain) = explode('@', $User->email);

  $name = $User->username;
  $email = $User->email;
  $url = $domain;
  $comment = "Hello, I am $name, registered on $User->registerDate from <a href=\"$url\">$url</a>.\r\n";


  $akismet = new Akismet('http://www.fijiwebdesign.com/', 'c511157d1d98');
  $akismet->setCommentAuthor($name);
  $akismet->setCommentAuthorEmail($email);
  $akismet->setCommentAuthorURL($url);
  $akismet->setCommentContent($comment);
  //$akismet->setPermalink('http://www.fijiwebddesign.com/');


  echo "$User->id, $User->username : ";
  if($akismet->isCommentSpam()) {
    saveValidationResult($User->id, true);
    echo "Spammer";
  } else {
    saveValidationResult($User->id, false);
    echo "Not Spammer";
  }

  echo "\r\n";
}

回答1:

It's best to think of Akismet as a giant Bayesian spam filter with some other heuristics. It works on the contents of a post, the timing of a post, and most importantly, how frequently it's seen similar content that has been reported as spammy. The string you're feeding to it is somewhat unique, so others will not have educated it on spammyness. Even if you did somehow mark that string as spammy, you'd end up with a whole bunch of false positives because you're just feeding all of the user accounts through it.

If you believe that you may have illegitimate users on your site, and they have not participated, simply delete the registration. If they are legitimate, they can simply re-register.

If the users are participating, simply look at their contributions. Their spammyness should be obvious.



回答2:

bucabay, use the contact form on Akismet.com to get in touch with us. We'll see if there's something we can do to help improve your results.

You can use Akismet to check signup registrations if it's done right. Accuracy isn't yet at the point where it's something we officially recommend, but we're working on improving it and you're welcome to experiment.

Captchas have their own set of problems. The major commercial spambots break them.



回答3:

You are reinventing a wheel that has been done lots of times very successfully. Just use Recaptcha or one of the methods from here - Practical non-image based CAPTCHA approaches?