Short story: I have a web application that has a huge incentive for participation. As such, we're being targeted heavily by the scripters and bots. Based on the IP addresses the submissions are coming from (1000+ and growing, no pattern whatsoever), I'm inclined to believe the submissions are being generated by a bot network. Even worse, the person(s) controlling the automated submissions are actively persuing things to the point that every time we make a change, they catch up within a few hours.
Some of the measures we've tried already:
- Captcha, both third party and home-grown, with varying degrees of readability
- An anti-request forgery token sent via cookie and hidden form field that is compared upon submit
- A hidden empty honeypot field that causes the submission to fail silently if the field contains data
- A hidden honeypot field that contains data by default and results in a silent fail if a piece of javascript does not run to clear the field's value
- Limiting submissions by IP address over a certain time period
- Blocking email domains known to be used by the automated scripts
- Blocking hosts based on simultaneous connections or connections per minute at the firewall
- Blocking the most flagrant IP addresses at the firewall
- Using an external address verification service to verify incoming addresses
Even with all of these measures in place, the submissions have not only continued, but seem to be increasing in frequency, on the order of 100,000+ per day.
The bogus entries are now using completely valid first and last names, and apparently have resorted to using some sort of directory listing to ensure that the addresses they use (which appear totally random and not at all consistent, btw) are actually valid US postal addresses. Additionally, I have logged the incoming form values to a debug log and verified that they are actually submitting valid captcha codes, indicating they have OCR good enough to decipher the images (the code itself is never sent to the client, only a GUID representing a code that is stored elsewhere on the back end)
In fact, the only way we can even tell the entries are bogus is by the pattern of email addresses and domains they are submitting. We've tried blocking the most active domains from entering, but the spammers just create or find new domains from which they can generate disposable email addresses and keep on going.
I'm pretty exhausted at this point, but I'm sure there's got to be something I haven't tried. Does anyone here have any bright ideas?