Does e-mail obfuscation really make automatic harv

2019-01-31 19:58发布

站内文章 / 后端开发

198 0

做个烂人

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Many users and forum programs in attempt to make automatic e-mail address harversting harder conseal them via obfuscation - @ is replaced with "at" and . is replaced with "dot", so

 team@stackoverflow.com

now becomes

team at stackoverflow dot com

I'm not an expert in regular expressions and I'm really curious - does such obfuscation really make automatic harvesting harder? Is it really much harder to automatically identify such obfuscated addresses?

回答1:

Definitely!

I read this article a while ago which shows how effective (as well as the relative degree) the various methods can be. Reversing an already reversed string seems to be fairly decent protection at the moment.

The following code sample:

<style type="text/css">
   span.codedirection { unicode-bidi:bidi-override; direction: rtl; }
</style>

<p><span class="codedirection">moc.etalllit@7raboofnavlis</span></p>

Will output the email so it's readable at least.

That said, it is almost an arms race. But as long at you're ahead of the curve, it'll be more effort to harvest your address rather than ordinary un-obfuscated ones.

回答2:

Obfuscation techniques falls in the same category than captchas. They are not reliable and tend to hurt regular users more than bots.

Javascript obfuscation seems to be praised, but is no silver bullet : it is not that hard today to automate a browser for email sniffing. If it can be displayed in a browser, it can be harvested. You could even imagine a bot that's taking screenshots of a browser window and using OCR to extract addresses to beat your million-dollar-obfuscation-technique.

Depending on where and why you want to obfuscate emails, those techniques could be useful :

Restrict email visibility : you may hide emails on your website/forum to anonymous users, to new users (with little to no activity or posts to date) or even hide them completely and replace email contact between members with a built-in private messaging feature.
Use a dedicated spam-filtered email : you will get spammed, but it will be limited to this particular address. This is a good trade-off when you need to expose the email address to any user.
Use a contact form : while bots are pretty good at filling forms, it turns out that they are too good at filling forms. Hidden field techniques can filter most of the spam coming through your contact form.

回答3:

When I see this type of obfuscation I also immediately think of regular expressions. It's a piece of cake to harvest emails "obfuscated" in this manner.

I once came with an idea to publish my email address in this way:

You can mail me here:

string myEmail = "";
myEmail = myEmail
          .Append ("myname")
          .Append ("@")
          .Append ("domain")
          .Append (".")
          .Append ("com");

Whoever does not make it out, has failed my basic intelligence test.

回答4:

It will be difficult for the spammers as well as your users to identify the email address.

A nice article from wikipedia on Email obfuscation or address munging

One common way of hiding email from bots and spammers is to create an image containing the email address. Facebook does this, for instance. Now, using images for email is inherently bad for accessibility, because text readers will not be able to read it. But even otherwise, there are several free character recognition programs that do a pretty good of decoding such email-images.

From here

回答5:

I'm not sure if it really helps with spam - but I've learned to love the Escape Encode Obfuscation for mailto: tags/emails. An example tag:

<a href="%6D%61%69%6C%74%6F%3A%74%65%61%6D%40%73%74%61%63%6B%6F%76%65%72%66%6C%6F%77%2E%63%6F%6D">&#116;&#101;&#97;&#109;&#64;&#115;&#116;&#97;&#99;&#107;&#111;&#118;&#101;&#114;&#102;&#108;&#111;&#119;&#46;&#99;&#111;&#109;</a>

Mails team@stackoverflow.com

回答6:

It's analagous to putting a "protected by ADT" sticker on your front door.

Will that prevent a talented burglar from entering your house? Of course not.

Will it make the house next door with an unlocked door and an iPod in the window a more compelling target? Pretty likely.

A simple unobfuscated email scraper is going to get TONS of emails as it is. Maybe a very simple regex to pick up very common obfuscation methods is worth the effort. Past that, you're spending a lot of time trying to decipher an increasingly small percentage of emails.

All that to say, having some clever obfuscation is probably worth it.

For the record, my email has been on my public resume in plain text for years now, because I use gmail, which has a spam filter that works.

回答7:

I was wondering why nobody mentioned ALAs solution so far.

Roel Van Gils wrote an Article about Graceful Email Obfuscation in 2007

Graceful Email Obfuscation is simply a JavaScript Email Obfuscation technique with a contact form fallback.

Email addresses are obfuscated by converting them into a url poiting to a contact form and applying a ROT13 transform
mailto:mail@example.com → contact/mail+example+com → contact/znvy+rknzcyr+pbz
Via javascript contact/znvy+rknzcyr+pbz is converted back to mailto:mail@example.com
If no javascript is available, the browser will open contact/znvy+rknzcyr+pbz as a fallback. The contact form will know where to send the email because of the url.

http://www.alistapart.com/articles/gracefulemailobfuscation/

回答8:

It does make it harder but there are so many really smart scrapers that it probably doesn't help a lot, since the big spammers are using the high quality spam tools.

回答9:

How to fight spamers? Make email address less recognizable for something without brain (i.e. computer).

Non-English speakers are your friends: if your user base is non-English speaking community, switch to obfuscating using other languages: team_małpa_stackoverlow_kropka_com or team_Affenschwanz_stackoverflow_Punkt_com are perfectly recognizable email addresses for respectively Polish- and German-speaking communities. Some email harvesters know Polish or German, but chance is most of harvesters will understand only English.

If you cannot leave English, than switch to some descriptive phrases- like: “in order to send us message write team in your address field, than put symbol AT, than write the name of our site!”.

回答10:

To provide a literal answer, yes, harvesting obfuscated addresses is harder than harvesting standardized addresses. The real question is whether the extra effort will be put in by harvesters and if the (major? minor?) barrier to the harvesters is worth the possible problems for your users.

If you are going to scramble addresses or otherwise transpose them away from the standard form, you should avoid being consistent in how you do so – at least on the same site.

For example, if every email address on a large community site is reversed in the markup and rendered properly with CSS, or token-replaced (@ becomes 'at'), or any other predictable method, the harvesters will just write a thin adapter for your site.

Think of it this way: if it only takes you one line of code to "scramble" them sitewide, it will only take the harvester one line of code to "unscramble" them for your site. Roughly speaking.

In my opinion, spam has become such a problem and so many DBs have been turned over that we're beyond hiding our addresses. Instead, consider looking at Defensio and Akismet, etc, to help classify and block spam.

回答11:

I have a solution, well, more of a theory. Problem is, the bots parse the page. they can get the text. even if it's being put into the page in some sophisticated way through Javascript.

So, just you CSS3 pseudo element! it won't be a link, but your email will be visible, and will never be an actual text. something like this:

.email::after{ content:'myemail@gmail.com'; }

Again, it's a theory, I've no idea how far these evil people can go to get it, but I think this be pretty safe. (unless they parse the CSS files, which I don't think they do)

回答12:

It does make it harder to a degree, but the simple ones used by users even today (the [dot] and [at]) are obsolete and can be captured easily using a simple regex by spammers.

Using something as simple as an image would be helpful and readable for the intended human reader without effort to 'decrypt' the encoded email id.

Contact email:

If you are still paranoid about character recognition equipped spam bots, them something like this would be effective.

It uses optical illusion as an advantage to complete letters in the human mind that cannot be easily understood by computer vision. Applying CAPCHA-like overlay can also help, but I doubt you need to go that far.