Practical non-image based CAPTCHA approaches?

2018-12-31 14:45发布

It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!

We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:

http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs

The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!

However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.

I have written a traditional CAPTCHA control for ASP.NET which we can re-use.

CaptchaImage

However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.

I've seen things like..

  • ASCII text captcha: \/\/(_)\/\/
  • math puzzles: what is 7 minus 3 times 2?
  • trivia questions: what tastes better, a toad or a popsicle?

Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.

Ideas?

30条回答
看淡一切
2楼-- · 2018-12-31 15:16

Avoid the worst CAPTCHAs of all time.

Trivia is OK, but you'll have to write each of them :-(

Someone would have to write them.

You could do trivia questions in the same way ReCaptcha does printed words. It offers two words, one of which it knows the answer to, another which it doesn't - after enough answers on the second, it now knows the answer to that too. Ask two trivia questions:

A woman needs a man like a fish needs a?

Orange orange orange. Type green.

Of course, this may need to be coupled with other techniques, such as timers or computed secrets. Questions would need to be rotated/retired, so to keep the supply of questions up you could ad-hoc add:

Enter your obvious question:

You don't even need an answer; other humans will figure that out for you. You may have to allow flagging questions as "too hard", like this one: "asdf ejflf asl;jf ei;fil;asfas".

Now, to slow someone who's running a StackOverflow gaming bot, you'd rotate the questions by IP address - so the same IP address doesn't get the same question until all the questions are exhausted. This slows building a dictionary of known questions, forcing the human owner of the bots to answer all of your trivia questions.

查看更多
冷夜・残月
3楼-- · 2018-12-31 15:18

A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:

<input type="hidden" name="antispam" value="lalalala" />

I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:

var antiSpam = function() {
        if (document.getElementById("antiSpam")) {
                a = document.getElementById("antiSpam");
                if (isNaN(a.value) == true) {
                        a.value = 0;
                } else {
                        a.value = parseInt(a.value) + 1;
                }
        }
        setTimeout("antiSpam()", 1000);
}

antiSpam();

Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.

If AntiSpam = A Integer
    If AntiSpam >= 10
        Comment = Approved
    Else
        Comment = Spam
Else
    Comment = Spam

The theory being that:

  • A spam bot will not support JavaScript and will submit what it sees
  • If the bot does support JavaScript it will submit the form instantly
  • The commenter has at least read some of the page before posting

The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.

Response to comments

@MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.

@AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.

查看更多
弹指情弦暗扣
4楼-- · 2018-12-31 15:19

What if you used a combination of the captcha ideas you had (choose any of them - or select one of them randomly):

  • ASCII text captcha: //(_)//
  • math puzzles: what is 7 minus 3 times 2?
  • trivia questions: what tastes better, a toad or a popsicle?

with the addition of placing the exact same captcha in a css hidden section of the page - the honeypot idea. That way, you'd have one place where you'd expect the correct answer and another where the answer should be unchanged.

查看更多
残风、尘缘若梦
5楼-- · 2018-12-31 15:20

1) Human solvers

All mentioned here solutions are circumvented by human solvers approach. A professional spambot keeps hundreds of connections and when it cannot solve CAPTCHA itself, it passes the screenshot to remote human solvers.

I frequently read that human solvers of CAPTCHAs break the laws. Well, this is written by those who do not know how this (spamming) industry works.
Human solvers do not directly interact with sites which CAPTCHAs they solve. They even do not know from which sites CAPTCHAs were taken and sent them. I am aware about dozens (if not hundreds) companies or/and websites offering human solvers services but not a single one for direct interaction with boards being broken.
The latter do not infringe any law, so CAPTCHA solving is completely legal (and officialy registered) business companies. They do not have criminal intentions and might, for example, have been used for remote testing, investigations, concept proofing, prototypong, etc.

2) Context-based Spam

AI (Artificial Intelligent) bots determine contexts and maintain context sensitive dialogues at different times from different IP addresses (of different countries). Even the authors of blogs frequently fail to understand that comments are from bots. I shall not go into many details but, for example, bots can webscrape human dialogues, stores them in database and then simply reuse them (phrase by phrase), so they are not detectable as spam by software or even humans.

The most voted answer telling:

  • *"The theory being that:
    • A spam bot will not support JavaScript and will submit what it sees
    • If the bot does support JavaScript it will submit the form instantly
    • The commenter has at least read some of the page before posting"*

as well honeypot answer and most answers in this thread are just plain wrong.
I daresay they are victim-doomed approaches

Most spambots work through local and remote javascript-aware (patched and managed) browsers from different IPs (of different countries) and they are quite clever to circumvent honey traps and honey pots.

The different problem is that even blog owners cannot frequently detect that comments are from bot since they are really from human dialogs and comments harvested from other web boards (forums, blog comments, etc)

3) Conceptually New Approach

Sorry, I removed this part as precipitated one

查看更多
梦寄多情
6楼-- · 2018-12-31 15:22

Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".

Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.

E.g. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh

In this case "stack" would be the CAPTCHA. There are obviously numerous variations on this idea.

Edit: Example variations to address some of the potential problems identified with this idea:

  • using randomly coloured letters instead of bold/italic.
  • using every second red letter for the CAPTCHA (reduces the possibility of bots identifying differently formatted letters to guess the CAPTCHA)
查看更多
泛滥B
7楼-- · 2018-12-31 15:23

Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.

Sufficiently simple, and it will be not difficult to code around it. I see two threats here:

  1. random spambots and the human spambots that might back them up; and
  2. bots created to game Stack Overflow

With simple arithmetics, you might beat off threat #1, but not threat #2.

查看更多
登录 后发表回答