Has reCaptcha been cracked / hacked / OCR'd /

2019-01-03 11:37发布

Have any programming methods have been used to defeat reCAPTCHA?

I'm interested in seeing evidence and potentially demonstrations that reCAPTCHA in particular has been made obsolete by completely automated, humanless methods.

To clarify, not looking for reCAPTCHA-cheating solutions that involve humans in any way, whether teams tasked with filling out CAPCHAs, porn-seekers, or Mechanical Turk.

I'm also not looking for alternatives to reCAPTCHA, like picking the type of animal, or background fields or javascript trickery.

14条回答
ら.Afraid
2楼-- · 2019-01-03 12:14
  • "In fact, it [reCAPTCHA] became pretty useless on 4 January [2011] when spammers apparently got their collective hands on a piece of software that circumvents reCAPTCHA and allows for a fully automated registration process. The bots have been busy, very busy indeed, ever since" [ 1 ]

2-3 years ago the text-typing based captchas approach trespassed the line when they lost its battle, i.e. further complications just make them relatively (since computer power is increasing, while human's not) easier for machines and more repugnant and repelling, if not completely impossible, to humans. This contadicts to original paradigm of CAPTCHA as a test to to ensure that the response is not generated by a computer

Update:
Note that reCAPTCHA is owned by Google Inc. but Google Inc. does not use it by their own services.
Here is a link containg webpage with captcha used by Google itself/internally for ex., for Gmail registration:

alt text



Note that Google's reCAPTCHA always has 2 words.
Here is the link for image with Google's reCAPTCHA offered to be used by others.

And reCAPTCHA's screenshot:

alt text

I leave to make the obvious conclusions to a reader.

Cited: [ 1 ]
vBulletin forums hit by reCAPTCHA cracking spam bot | PC Pro blog
Posted on January 12th, 2011 by Davey Winder

查看更多
甜甜的少女心
3楼-- · 2019-01-03 12:16

I notice that almost all the answers here relate to the ineffectiveness of the concept of CAPTCHA, in principle - and while I very much agree with them, in fact gave a talk at OWASP a few months ago explaining just that - the question is very specific, so I will provide for a demonstration.
But first, I will reiterate that demonstration aside, re-read the other comments, since it's truth that CAPTCHA is pointless and not helpful, irrelevant of implementation....

But really, check out CAPTCHA Killer. You can upload a CAPTCHA image, and it will automatically, if not immediately, provide the OCR'd answer. It also provides for an API (REST, I think, but maybe also SOAP). I personally tried numerous reCAPTCHA images, and it was actually some of the easiest ones (or at least quickest) broken.

UPDATE: CAPTCHA Killer's website is now taken down, apparently under legal pressure. See http://captcha.org/ for a complete overview of the topic.

And yeah, OCR is not the best way to break a CAPTCHA protected site - there are many other better ways.

查看更多
smile是对你的礼貌
4楼-- · 2019-01-03 12:16

The easiest way to defeat Captchas is Amazon Mechanical Turk. There's a guy named Kermit Welda who pays people a nickel each to register Hotmail, AOL and Gmail accounts. That's 6,000 fake email accounts at 5 cents = $300 a day. The cost of doing business is pretty cheap when you have other people do the dirty work for you. No wonder our server's spam filters want to reject anything from Hotmail.

查看更多
看我几分像从前
5楼-- · 2019-01-03 12:19

You might be interested in this detailed report on how 4chan defeated reCAPTCHA, and used it to manipulate Time.com's annual TIME 100 Poll results.

Hacking Recaptcha (aka ‘The Penis Flood’)

The next tactic used was to see if they could find a flaw in the reCAPTCHA implementation. One thing they discovered about reCAPTCHA was that it always presents two words to a user for decoding - one word is a control word known by the reCAPTCHA system, while the other is an unknown word (reCAPTCHA uses the humans to help correct OCR errors). Wikipedia describes the process: “Scanned text is subjected to analysis by two different optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. The word is displayed along with a control word already known and is labeled by the human. Those words that are consistently given a single label by human judges are recycled as control words”. 2iasdo4 What Anonymous realized was that if they always labeled the unknown scanned text with the same word - and if they did this thousands and thousands of times eventually a large percentage of the unknown words would be mislabeled with their word. All they had to do was look at the two words in the captcha, enter the proper label for the ‘easy’ one (presumably that would be the one that the two optical scanners would agree upon) and enter the word “penis” for the hard one. If they did this often enough, then soon a significant percentage of the images would be labeled as ‘penis’ and the ability to autovote would be restored (one side effect, that was not lost on Anonymous, was the notion that for years to come there would be a number of digital books with the word ‘penis’ randomly inserted throughout the text. Update: I asked Ben Maurer, chief engineer of reCAPTCHA about this ‘penis flood‘ attack, Ben says that they’ve anticipated this type of attack and they have numerous protections that will keep the penises from penetrating the reCAPTCHA barrier.

Optimizing reCAPTCHA

As appealing as the notion of sprinkling the word ‘penis’ into texts, the Anonymous team knew that the clock was ticking, and if they were going to restore the Message they didn’t have time to wait for the autovoters to come back online - they were going to have to vote manually, many, many times. And so they needed to be able to enter captcha’s as fast as they could. They developed a set of guidelines that allowed them to quickly decide which reCAPTCHA words they could skip. For example:

You will be given 2 words: 1 real, 1 fake.

For [REAL FAKE] or [FAKE REAL], you can just type in REAL and it should be accepted.

If it’s [LOOKSREAL LOOKSREAL] or [LOOKSFAKE LOOKSFAKE], it’s usually just quicker to just type in both words. Don’t waste precious time deciding which one of them is real.

Use both the appearance and the type of word to identify a fake word. Don’t rely on just one of them.

The whole ruleset is here: fake captcha.

查看更多
闹够了就滚
6楼-- · 2019-01-03 12:19

The weakness of CAPTCHA systems is that people set up rooms full of people in China whose only job it is is to look at a CAPTCHA image and type in the result, which plugs into the automated system that's actually doing the spamming.

Not much you can do about that really.

It's also far cheaper than trying to do image recognition, OCR, etc on the actual image (you may get a response for under $0.01 the other way).

查看更多
看我几分像从前
7楼-- · 2019-01-03 12:19

There are lots of methods that are used to crap recaptcha. While its hard to use neural netwpork enabled programs to automatically solve them, its possible to grab the image and have amazon's mechanical turk or some equivalent program to solve them.

http://codemagician.wordpress.com/2010/01/22/solving-recaptcha/

查看更多
登录 后发表回答