Grep regular expression for digits in character st

2019-07-19 15:40发布

问题:

I need some way to find words that contain any combination of characters and digits but exactly 4 digits only, and at least one character.

EXAMPLE:

a1a1a1a1        // Match
1234            // NO match (no characters)
a1a1a1a1a1      // NO match
ab2b2           // NO match
cd12            // NO match
z9989           // Match
1ab26a9         // Match
1ab1c1          // NO match
12345           // NO match
24              // NO match
a2b2c2d2        // Match
ab11cd22dd33    // NO match

回答1:

to match a digit in grep you can use [0-9]. To match anything but a digit, you can use [^0-9]. Since that can be any number of , or no chars, you add a "*" (any number of the preceding). So what you'll want is logically

(anything not a digit or nothing)* (any single digit) (anything not a digit or nothing)* ....

until you have 4 "any single digit" groups. i.e. [^0-9]*[0-9]...

I find with grep long patterns, especially with long strings of special chars that need to be escaped, it's best to build up slowly so you're sure you understand whats going on. For example,

#this will highlight your matches, and make it easier to understand
alias grep='grep --color=auto'
echo 'a1b2' | grep '[0-9]' 

will show you how it's matching. You can then extend the pattern once you understand each part.



回答2:

I'm not sure about all the other input you might take (i.e. is ax12ax12ax12ax12 valid?), but this will work based on what you posted:

%> grep -P "^(?:\w\d){4}$" fileWithInput


回答3:

If you don't mind using a little shell as well, you could do something like this:

echo "a1a1a1a1" |grep -o '[0-9]'|wc -l

which would display the number of digits found in the string. If you like, you could then test for a given number of matches:

max_match=4
[ "$(echo "a1da4a3aaa4a4" | grep -o '[0-9]'|wc -l)" -le $max_match ] || echo "too many digits."


回答4:

Assuming you only need ASCII, and you can only access the (fairly primitive) regexp constructs of grep, the following should be pretty close:

grep ^[a-zA-Z]*[0-9][a-zA-Z]*[a-zA-Z]*[0-9][a-zA-Z]*[a-zA-Z]*[0-9][a-zA-Z]*[a-zA-Z]*[0-9][a-zA-Z]*$ | grep [a-zA-Z]


回答5:

You might try

[^0-9]*[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*[0-9][^0-9]*

But this will match 1234. why doesn't that match your criteria?



回答6:

The regex for that is:

([A-Za-z]\d){4}
  • [A-Za-z] - for character class
  • \d - for number
  • you wrapp them in () to group them indicating the format character follow by number
  • {4} - indicating that it must be 4 repetitions


回答7:

With grep:

grep -iE '^([a-z]*[0-9]){4}[a-z]*$' | grep -vE '^[0-9]{4}$'

Do it in one pattern with Perl:

perl -ne 'print if /^(?!\d{4}$)([^\W\d_]*\d){4}[^\W\d_]*$/'

The funky [^\W\d_] character class is a cosmopolitan way to spell [A-Za-z]: it catches all letters rather than only the English ones.



回答8:

you can use normal shell script, no need complicated regex.

var=a1a1a1a1
alldigits=${var//[^0-9]/}
allletters=${var//[0-9]/}
case "${#alldigits}" in
   4)
    if [ "${#allletters}" -gt 0 ];then
        echo "ok: 4 digits and letters: $var"
    else
        echo "Invalid: all numbers and exactly 4: $var"
    fi
    ;;
   *) echo "Invalid: $var";;
esac


回答9:

thanks for your answers finaly i wrote some script and it work perfect: . /P ab2b2 cd12 z9989 1ab26a9 1ab1c1 1234 24 a2b2c2d2

#!/bin/bash
echo "$@" |tr -s " " "\n"s >> sorting
cat sorting | while read tostr
do
  l=$(echo $tostr|tr -d "\n"|wc -c)
  temp=$(echo $tostr|tr -d a-z|tr -d "\n" | wc -c)

  if [ $temp -eq 4 ]; then
    if [ $l -gt 4 ]; then
      printf "%s " "$tostr"
    fi
  fi
done
echo


标签: regex shell grep