Extract substring in Bash

2018-12-31 20:52发布

Given a filename in the form someletters_12345_moreleters.ext, I want to extract the 5 digits and put them into a variable.

So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.

I am very interested in the number of different ways that this can be accomplished.

20条回答
皆成旧梦
2楼-- · 2018-12-31 21:46

Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.

filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345
查看更多
路过你的时光
3楼-- · 2018-12-31 21:47

Without any sub-processes you can:

shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}

A very small variant of this will also work in ksh93.

查看更多
像晚风撩人
4楼-- · 2018-12-31 21:51

I'm surprised this pure bash solution didn't come up:

a="someletters_12345_moreleters.ext"
IFS="_"
set $a
echo $2
# prints 12345

You probably want to reset IFS to what value it was before, or unset IFS afterwards!

查看更多
栀子花@的思念
5楼-- · 2018-12-31 21:51

I love sed's capability to deal with regex groups:

> var="someletters_12345_moreletters.ext"
> digits=$( echo $var | sed "s/.*_\([0-9]\+\).*/\1/p" -n )
> echo $digits
12345

A slightly more general option would be not to assume that you have an underscore _ marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence: s/[^0-9]\+\([0-9]\+\).*/\1/p.


> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion matched with replacement.  The replacement may contain the special  character  &  to
    refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.

More on this, in case you're not too confident with regexps:

  • s is for _s_ubstitute
  • [0-9]+ matches 1+ digits
  • \1 links to the group n.1 of the regex output (group 0 is the whole match, group 1 is the match within parentheses in this case)
  • p flag is for _p_rinting

All escapes \ are there to make sed's regexp processing work.

查看更多
泪湿衣
6楼-- · 2018-12-31 21:52

Generic solution where the number can be anywhere in the filename, using the first of such sequences:

number=$(echo $filename | egrep -o '[[:digit:]]{5}' | head -n1)

Another solution to extract exactly a part of a variable:

number=${filename:offset:length}

If your filename always have the format stuff_digits_... you can use awk:

number=$(echo $filename | awk -F _ '{ print $2 }')

Yet another solution to remove everything except digits, use

number=$(echo $filename | tr -cd '[[:digit:]]')
查看更多
回忆,回不去的记忆
7楼-- · 2018-12-31 21:52

Given test.txt is a file containing "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 "ST" 
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST
查看更多
登录 后发表回答