Currently we are running a competition which proceeds very well. Unfortunately we have all those cheaters back in business who are running scripts which automatically vote for their entries. We already saw some cheaters by looking at the database entries by hand - 5 Star ratings with same browser exactly all 70 minutes for example. Now as the userbase grows up it gets harder and harder to identify them.
What we do until now:
- We store the IP and the browser and block that combination to a one hour timeframe. Cookies won't help against these guys.
- We are also using a Captcha, which has been broken
Does anyone know how we could find patterns in our database with a PHP script or how we could block them more efficiently?
Any help would be very appreciated...
The only thing that comes to mind is using a Captcha. Either an elaborate one with pictures and noise like the ReCaptcha service, or a very simple and unobtrusive one like "What is seven plus three?" or (If you're located in the US), "What is the last name of our President", simple common sense questions everybody can answer. If you change them often enough, this could even be more effective than a classic image-based CAPTCHA.
Sorry for the double post, but I wasn't allowed to post two URLs in the same post...
If you're looking at building your own tracking, maybe this link might provide some inspiration: https://panopticlick.eff.org/ Turns out that a lot of browsers can be uniquely identified, even without any form of tracking cookies. I'm guessing a vote-bot might give a very specific fingerprint?
Direct feedback elimination
This is more of a general strategy that can be combined with many of the other methods. Don't let the spammer know if he succeeds.
You can either hide the current results altogether, only show percentages without absolute number of votes or delay the display of the votes.
Vote flagging
Also a general strategy. If you have some reason to assume that the vote is by a spammer, count their vote and mark it as invalid and delete the invalid votes at the end.
Captcha
Use a CAPTCHA. If your Captcha is broken, use a better one.
IP checking
Limit the number of votes an IP address can cast in a timespan.
Referrer checking
If you assume that one user maps one IP address, you can limit the number if votes by that IP address. However this assumption usually only holds true for private households.
Email Confirmation
Use Email confirmation and only allow one vote per Email. Check your database manually to see if they are using throwaway-emails.
Note that you can add
+foo
to your username in an email address.username@example.com
andusername+foo@example.com
will both deliver the mail to the same account, so remember that when checking if somebody has already voted.HTML Form Randomization
Randomize the order of choices. This might take a while for them to find out.
HTTPS
One method of vote faking is to capture the http request from a valid browser like Firefox and mimic it with a script, this doesn't work as easy when you use encryption.
Proxy checking
If the spammer votes via proxy, you can check for the X-Forwarded-For header.
Cache checking
Try to see if the client loads all the uncached resources. Many spambots don't do this. I never tried this, I just know that this isn't checked usually by voting sites.
An example would be embedding
<img src="a.gif" />
in your html, with a.gif being some 1x1 pixel image. Then you have to set the http header for the requestGET /a.gif
withCache-Control "no-cache, must-revalidate"
. You can set the http headers in Apache with your.htaccess
file like this. (thanks Jacco)[Edit 2010-09-22]
Evercookie
CAPTCHA is always good, might be "disturbing" for some users though.
reCAPTCHA is a fairly used service
You could add a honeypot field like in Django. Most likely, this will not protect you from cheaters who deliberately want to change your competition, but at least you will have lesser 'drive-by' spammers to additionally take care of.
What about some post hoc stochastic analysis, like time series analysis - looking for periodicity in events of particular
(ip, browser, vote)
? You could then assign probability to each such group of events that it belongs to 1 person and either discard all such groups of events beyond some probability level, or use some kind of weighting to lower the weight according to the probability.Look in R, it contains A LOT of useful analysis packages.