Practical non-image based CAPTCHA approaches?

2018-12-31 14:45发布

It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!

We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:

http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs

The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!

However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.

I have written a traditional CAPTCHA control for ASP.NET which we can re-use.

CaptchaImage

However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.

I've seen things like..

  • ASCII text captcha: \/\/(_)\/\/
  • math puzzles: what is 7 minus 3 times 2?
  • trivia questions: what tastes better, a toad or a popsicle?

Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.

Ideas?

30条回答
人间绝色
2楼-- · 2018-12-31 15:02

Simple text sounds great. Bribe the community to do the work! If you believe, as I do, that SO rep points measure a user's commitment to helping the site succeed, it is completely reasonable to offer reputation points to help protect the site from spammers.

Offer +10 reputation for each contribution of a simple question and a set of correct answers. The question should suitably far away (edit distance) from all existing questions, and the reputation (and the question) should gradually disappear if people can't answer it. Let's say if the failure rate on correct answers is more than 20%, then the submitter loses one reputation point per incorrect answer, up to a maximum of 15. So if you submit a bad question, you get +10 now but eventually you will net -5. Or maybe it makes sense to ask a sample of users to vote on whether the captcha questionis a good one.

Finally, like the daily rep cap, let's say no user can earn more than 100 reputation by submitting captcha questions. This is a reasonable restriction on the weight given to such contributions, and it also may help prevent spammers from seeding questions into the system. For example, you could choose questions not with equal probability but with a probability proportional to the submitter's reputation. Jon Skeet, please don't submit any questions :-)

查看更多
像晚风撩人
3楼-- · 2018-12-31 15:02

I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.

Add 2 or more form fields like this:

<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />

Then use CSS to hide them:

.hideme {
    display: none;
}

On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.

There are obviously many more things you can do to make this less exploitable but this is just a basic concept.

查看更多
浪荡孟婆
4楼-- · 2018-12-31 15:04

Make an AJAX query for a cryptographic nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in JavaScript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.

This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain cookie state to get a comment through. Plus they only ever see the Set-Cookie header if they parse and execute the JavaScript in the first place and make the AJAX request. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with JavaScript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.

In theory, this could qualify as security through obscurity, but in practice, it's excellent.

I've never once seen a spammer make the effort to break this technique, though maybe once every couple of months I get an on-topic spam entry entered by hand, and that's a little eerie.

查看更多
人气声优
5楼-- · 2018-12-31 15:05

I saw this once on a friend's site. He is selling it for 20 bucks. It's ASCII art!

http://thephppro.com/products/captcha/

  .oooooo.         oooooooo 
 d8P'  `Y8b       dP""""""" 
888      888     d88888b.   
888      888 V       `Y88b '
888      888           ]88  
`88b    d88'     o.   .88P  
 `Y8bood8P'      `8bd88P'   
查看更多
笑指拈花
6楼-- · 2018-12-31 15:06

I have to admit that I have no experience fighting spambots and don't really know how sophisticated they are. That said, I don't see anything in the jQuery article that couldn't be accomplished purely on the server.

To rephrase the summary from the jQuery article:

  1. When generating the contact form on the server ...
  2. Grab the current time.
  3. Combine that timestamp, plus a secret word, and generate a 32 character 'hash' and store it as a cookie on the visitor's browser.
  4. Store the hash or 'token' timestamp in a hidden form tag.
  5. When the form is posted back, the value of the timestamp will be compared to the 32 character 'token' stored in the cookie.
  6. If the information doesn't match, or is missing, or if the timestamp is too old, stop execution of the request ...

Another option, if you want to use the traditional image CAPTCHA without the overhead of generating them on every request is to pre-generate them offline. Then you just need to randomly choose one to display with each form.

查看更多
登录 后发表回答