regex string does not contain substring

2019-04-19 18:22发布

问题:

I am trying to match a string which does not contain a substring

My string always starts "http://www.domain.com/"

The substring I want to exclude from matches is ".a/" which comes after the string (a folder name in the domain name)

There will be characters in the string after the substring I want to exclude

For example:

"http://www.domain.com/.a/test.jpg" should not be matched

But "http://www.domain.com/test.jpg" should be

回答1:

Use a negative lookahead assertion as:

^http://www\.domain\.com/(?!\.a/).*$

Rubular Link

The part (?!\.a/) matches anything other than .a/



回答2:

My advise in such cases is not to construct overly complicated regexes whith negative lookahead assertions or such stuff.
Keep it simple and stupid!
Do 2 matches, one for the positives, and sort out later the negatives (or the other way around). Most of the time, the regexes become easier, if not trivial. And your program gets clearer.
For example, to extract all lines with foo, but not foobar, I use:

grep foo | grep -v foobar


回答3:

I would try with

^http:\/\/www\.domain\.com\/([^.]|\.[^a]).*$

You want to match your domain, plus everything that do not continue with a . and everything that do continue with a . but not a a. (Eventually you can add you / if needed after)



回答4:

If you don't use look ahead, but just simple regex, you can just say, if it matches your domain but doesn't match with a .a/

<?php

function foo($s) {

    $regexDomain = '{^http://www.domain.com/}';
    $regexDomainBadPath = '{^http://www.domain.com/\.a/}';

    return preg_match($regexDomain, $s) && !preg_match($regexDomainBadPath, $s);
}

var_dump(foo('http://www.domain.com/'));
var_dump(foo('http://www.otherdomain.com/'));

var_dump(foo('http://www.domain.com/hello'));
var_dump(foo('http://www.domain.com/hello.html'));
var_dump(foo('http://www.domain.com/.a'));
var_dump(foo('http://www.domain.com/.a/hello'));
var_dump(foo('http://www.domain.com/.b/hello'));
var_dump(foo('http://www.domain.com/da/hello'));

?>

note that http://www.domain.com/.a will pass the test, because it doesn't end with /.