RegEx to exclude match if a certain word is presen

I have the keyword "cum" which our firewall uses to block adult sites, problem is this works a little too well because this also blocks any URL with the word "document"

The firewall will take regex strings, and I tried this:

^.*(?!document)cum.*$

Vut it still matches "document". I have a feeling I should be using a pipe | but I don't get it.

What I want is to match anywhere

*cum*

is found in the URL (or domain-name), but NOT if the word is document or documents.

Possible? As I understand it, a word boundary doesn't work here because the word cum won't necessarily be separated by white-space when it's in a URL, and definitely not if it's in a domain-name.

Here's another way to put it:

Allow "examplesearchdocuments.com"
Allow "examplemydocuments.com"
Allow "documentexample.com"
Allow "example.com/somedocuments"
Don't allow "funnycumsiteexample.com"
Don't allow "cumallovereverythingexample.com"
Don't allow "exampleseemycum.com"

where cum being the bad word match. Sorry if any of these examples are real sites, I don't know how else to convey this.

标签： regex url filtering firewall

2条回答

姐就是有狂的资本

2楼-- · 2019-07-14 05:55

My first suggestion would also be to use \bcum\b as the others, but that doesn't match e.g. cumming.

You're almost right with the negative lookaround (?!) syntax:

For negative lookbehind you need the <
For negative lookahead you need don't need >
See: http://www.regular-expressions.info/lookaround.html for more

^.*(?<!do)cum(?!ent).*$

^.*(?<!do)cum(?!ents?).*$

to support plural. You can check it at: http://fiddle.re/3pyj by clicking Java for a the examples you provided.

My suggestion would be ^.*\bcum.*$ to match a word boundary, i.e. word start and the 'cum' and anything after.

0人赞添加讨论(0) 举报

We Are One

3楼-- · 2019-07-14 06:03

Per the comments, I was wrong.

If you use a lookbehind inside your lookahead, you can match "cum" only if it is not within the word "document".

cum(?!(?<=docum)ent)

Here is some reading on lookaround http://www.regular-expressions.info/lookaround.html

Here it is against a large number of tests.

http://www.rubular.com/r/b5iZrn6Cjz

0人赞添加讨论(0) 举报

RegEx to exclude match if a certain word is presen

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间