I am trying to match a string which does not contain a substring
My string always starts "http://www.domain.com/"
The substring I want to exclude from matches is ".a/" which comes after the string (a folder name in the domain name)
There will be characters in the string after the substring I want to exclude
For example:
"http://www.domain.com/.a/test.jpg" should not be matched
But "http://www.domain.com/test.jpg" should be
Use a negative lookahead assertion as:
^http://www\.domain\.com/(?!\.a/).*$
Rubular Link
The part (?!\.a/)
matches anything other than .a/
My advise in such cases is not to construct overly complicated regexes whith negative lookahead assertions or such stuff.
Keep it simple and stupid!
Do 2 matches, one for the positives, and sort out later the negatives (or the other way around). Most of the time, the regexes become easier, if not trivial.
And your program gets clearer.
For example, to extract all lines with foo, but not foobar, I use:
grep foo | grep -v foobar
I would try with
^http:\/\/www\.domain\.com\/([^.]|\.[^a]).*$
You want to match your domain, plus everything that do not continue with a . and everything that do continue with a . but not a a
. (Eventually you can add you / if needed after)
If you don't use look ahead, but just simple regex, you can just say, if it matches your domain but doesn't match with a .a/
<?php
function foo($s) {
$regexDomain = '{^http://www.domain.com/}';
$regexDomainBadPath = '{^http://www.domain.com/\.a/}';
return preg_match($regexDomain, $s) && !preg_match($regexDomainBadPath, $s);
}
var_dump(foo('http://www.domain.com/'));
var_dump(foo('http://www.otherdomain.com/'));
var_dump(foo('http://www.domain.com/hello'));
var_dump(foo('http://www.domain.com/hello.html'));
var_dump(foo('http://www.domain.com/.a'));
var_dump(foo('http://www.domain.com/.a/hello'));
var_dump(foo('http://www.domain.com/.b/hello'));
var_dump(foo('http://www.domain.com/da/hello'));
?>
note that http://www.domain.com/.a
will pass the test, because it doesn't end with /
.