Why are people using regexp for email and other co

2020-02-10 03:40发布

There are a number of email regexp questions popping up here, and I'm honestly baffled why people are using these insanely obtuse matching expressions rather than a very simple parser that splits the email up into the name and domain tokens, and then validates those against the valid characters allowed for name (there's no further check that can be done on this portion) and the valid characters for the domain (and I suppose you could add checking for all the world's TLDs, and then another level of second level domains for countries with such (ie, com.uk)).

The real problem is that the tlds and slds keep changing (contrary to popular belief), so you have to keep updating the regexp if you plan on doing all this high level checking whenever the root name servers send down a change.

Why not have a module that simply validates domains, which pulls from a database, or flat file, and optionally checks DNS for matching records?

I'm being serious here, why is everyone so keen on inventing the perfect regexp for this? It doesn't seem to be a suitable solution to the problem...

Convince me that it's not only possible to do in regexp (and satisfy everyone) but that it's a better solution than a custom parser/validator.

-Adam

12条回答
Summer. ? 凉城
2楼-- · 2020-02-10 04:24

The temptation of using RegExp, once you've mastered the basics, is very big. In fact, RegExp seems so powerful that people naturally want to start using it everywhere. I really suspect that there's a lot of psychology involved here, as demonstrated by Randall's XKCD comic (and yes, it is useful).

I've done an introductory presentation on RegExp once and the most important slide warned against its overuse. It was the only slide that used bold font. I believe this should be done more often.

Everybody stand back!

查看更多
Anthone
3楼-- · 2020-02-10 04:28

On factor: the set of people who understand how to write a regular expression is very much larger than the set of people who understand the formal constraints on regular languages. Same goes for non-regular "regular expressions".

查看更多
贪生不怕死
4楼-- · 2020-02-10 04:33

People use regexes for email addresses, HTML, XML, etc. because:

  1. It looks like they should work and they often do work for the obvious cases.
  2. They "know" regular expressions. When all you have is a hammer all your problems look like nails.
  3. Writing a parser is harder (or seems harder) than writing a regular expression. In particular, writing a parser is harder than writing a regex that handles the obvious cases in #1.
  4. They don't understand the full complexity of the task.
  5. They don't understand the limitations of regular expressions.
  6. They start with a regex that handles the obvious cases and then try to extend it to handle others. They get locked into one approach.
  7. They aren't aware that there's (probably) a library available to do the work for them.
查看更多
我只想做你的唯一
5楼-- · 2020-02-10 04:36

We're just looking for a fast way to see if the email address is valid so that we can warn the user they have made a mistake or prevent people from entering junk easily. Going off to the mail server and fingering it is slow and unreliable. The only real way to be sure is to get a confirmation email, but the problem is only to give a fast response to the user before the confirmation process takes place. That's why it's not so important to be strictly compliant. Anyway, it's a challenge and it's fun.

查看更多
祖国的老花朵
6楼-- · 2020-02-10 04:36

Regexps are much faster to use, of course, and they only validate what's specified in the RFC. Write a custom parser? What? It takes 10 seconds to use a regexp.

查看更多
爱情/是我丢掉的垃圾
7楼-- · 2020-02-10 04:39

People write regular expressions because most developers like so solve a simple problem in the most "cool" en "efficient" way (which means that it should be as unreadable as possible).

In Java, there are libraries to check if a String represents an email address without you having to know anything about regular expressions. These libraries should be available for other languages aswel.

Like Jamie Zawinski said in 1997: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."

查看更多
登录 后发表回答