How to use regex negative lookahead

2019-07-23 04:58发布

问题:

I'm trying to get the email addresses from a file using egrep -o -e and having trouble with addresses at the end of a line.

Here is my regex:

egrep -o -e "[._a-zA-Z0-9]+@[._a-zA-Z0-9]+.[._a-zA-Z0-9]+" ~/myfile.txt

I realize this will not catch every variation of an email address, but if the address is at the end of a line this is what I get:

user@_12345@myemail.com\ul

So I figured I'd try a negative lookahead, but I have no idea how to properly use it. I've read a few things online but I'm confused by how it works.

This is what I've tried:

egrep -o -e "(?!\\[._a-zA-Z0-9]+@[._a-zA-Z0-9]+.[._a-zA-Z0-9]+)" ~/myfile.txt

Bash fails with event not found: \\[._a

Any suggestions?

回答1:

What does the dot stand for?

"[._a-zA-Z0-9]+@[._a-zA-Z0-9]+.[._a-zA-Z0-9]+"
                              ^
                             here

It matches the at-sign. If you remove it, your original regex with no lookahead will work.

Moreover, ! is a special character in bash (history expansion). You have to backslash it to use it literally.



回答2:

The ! is being interpolated as a history expansion command in bash. You should use single quotes rather than double quotes to prevent this.

However you should note that negative lookahead may not be supported by your version of grep either. In this case you need a more powerful regex tool like perl or ack.