How to inverse match/blacklist with regex?

2019-07-24 18:08发布

I've seen this question: Regular expression to match a line that doesn't contain a word?

But I can't get it to work. I have a shell script and I'm using

string1.*string2.*string3

To search for 3 words in a file, in that order. But I want to change it so that if badword5 is anywhere in between those words in that file, there is no regex match with grep.

So this should match:

./testing/test.txt:   let prep = "select string1, dog from cat",
          " where apple = 1",
          " and string2 = 2",
          " and grass = 8",
          " and string3 = ?"

But this should not:

   ./testing/test.txt:   let prep = "select string1, dog from cat",
          " where apple = 1",
          " and string2 = 2",
          " and grass = 8",
          " and badword5 = 4", 
          " and string3 = ?"

I unsuccessfully tried:

string1((?!badword5)|.)*string2((?!badword5)|.)*string3

The entire script:

find . -name "$file_to_check" 2>/null | while read $FILE
do
   tr '\n' ' ' <"$FILE" | if grep -q "string1.*string2.*string3"; then echo "$FILE" ; fi
done >> $grep_out

标签: regex shell unix
2条回答
劫难
2楼-- · 2019-07-24 18:58

You can use grep -v to skip a line for badword5:

tr '\n' ' ' < "$FILE" | grep -v 'badword5' | if grep -q "string1.*string2.*string3"; then echo "$FILE" ; fi
查看更多
我想做一个坏孩纸
3楼-- · 2019-07-24 19:04

"To search for 3 words in a file, in that order. But I want to change it so that if badword5 is anywhere in between those words in that file, there is no regex match with grep."

Indeed, and the search pattern stretches multiple lines.
let's drop grep for the moment and try something different:

#!/bin/bash

find . -name "$file_to_check" 2>/dev/null | while read FILE
do
    SCORE=0
    tr ' ' '\n' <"$FILE" | while read WORD
    do
        case $WORD in
        "word1"    ) [ $SCORE = 0 ] && SCORE=1               ;;
        "word2"    ) [ $SCORE = 1 ] && SCORE=2               ;;
        "word3"    ) [ $SCORE = 2 ] && echo "$FILE" && break ;;
        "badword5" ) SCORE=0                                 ;;
        esac
    done        
done >grep_out

the case lines do the following thing:

"    word1"      )    [ $SCORE     =       0 ] &&      SCORE  =       1  ;;
when word1 is found: and SCORE is equal to 0 then make SCORE equal to 1
when word2 is found: and SCORE is equal to 1 then make SCORE equal to 2
when word3 is found: and SCORE is equal to 2 then print filename and break out of the inner loop.
查看更多
登录 后发表回答