可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Given a filename in the form someletters_12345_moreleters.ext
, I want to extract the 5 digits and put them into a variable.
So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
I am very interested in the number of different ways that this can be accomplished.
回答1:
Use cut:
echo \'someletters_12345_moreleters.ext\' | cut -d\'_\' -f 2
More generic:
INPUT=\'someletters_12345_moreleters.ext\'
SUBSTRING=$(echo $INPUT| cut -d\'_\' -f 2)
echo $SUBSTRING
回答2:
If x is constant, the following parameter expansion performs substring extraction:
b=${a:12:5}
where 12 is the offset (zero-based) and 5 is the length
If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:
tmp=${a#*_} # remove prefix ending in \"_\"
b=${tmp%_*} # remove suffix starting with \"_\"
If there are other underscores, it\'s probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I\'d like to know too.
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
回答3:
Generic solution where the number can be anywhere in the filename, using the first of such sequences:
number=$(echo $filename | egrep -o \'[[:digit:]]{5}\' | head -n1)
Another solution to extract exactly a part of a variable:
number=${filename:offset:length}
If your filename always have the format stuff_digits_...
you can use awk:
number=$(echo $filename | awk -F _ \'{ print $2 }\')
Yet another solution to remove everything except digits, use
number=$(echo $filename | tr -cd \'[[:digit:]]\')
回答4:
just try to use cut -c startIndx-stopIndx
回答5:
In case someone wants more rigorous information, you can also search it in man bash like this
$ man bash [press return key]
/substring [press return key]
[press \"n\" key]
[press \"n\" key]
[press \"n\" key]
[press \"n\" key]
Result:
${parameter:offset}
${parameter:offset:length}
Substring Expansion. Expands to up to length characters of
parameter starting at the character specified by offset. If
length is omitted, expands to the substring of parameter start‐
ing at the character specified by offset. length and offset are
arithmetic expressions (see ARITHMETIC EVALUATION below). If
offset evaluates to a number less than zero, the value is used
as an offset from the end of the value of parameter. Arithmetic
expressions starting with a - must be separated by whitespace
from the preceding : to be distinguished from the Use Default
Values expansion. If length evaluates to a number less than
zero, and parameter is not @ and not an indexed or associative
array, it is interpreted as an offset from the end of the value
of parameter rather than a number of characters, and the expan‐
sion is the characters between the two offsets. If parameter is
@, the result is length positional parameters beginning at off‐
set. If parameter is an indexed array name subscripted by @ or
*, the result is the length members of the array beginning with
${parameter[offset]}. A negative offset is taken relative to
one greater than the maximum index of the specified array. Sub‐
string expansion applied to an associative array produces unde‐
fined results. Note that a negative offset must be separated
from the colon by at least one space to avoid being confused
with the :- expansion. Substring indexing is zero-based unless
the positional parameters are used, in which case the indexing
starts at 1 by default. If offset is 0, and the positional
parameters are used, $0 is prefixed to the list.
回答6:
Building on jor\'s answer (which doesn\'t work for me):
substring=$(expr \"$filename\" : \'.*_\\([^_]*\\)_.*\')
回答7:
I\'m surprised this pure bash solution didn\'t come up:
a=\"someletters_12345_moreleters.ext\"
IFS=\"_\"
set $a
echo $2
# prints 12345
You probably want to reset IFS to what value it was before, or unset IFS
afterwards!
回答8:
Following the requirements
I have a filename with x number of characters then a five digit
sequence surrounded by a single underscore on either side then another
set of x number of characters. I want to take the 5 digit number and
put that into a variable.
I found some grep
ways that may be useful:
$ echo \"someletters_12345_moreleters.ext\" | grep -Eo \"[[:digit:]]+\"
12345
or better
$ echo \"someletters_12345_moreleters.ext\" | grep -Eo \"[[:digit:]]{5}\"
12345
And then with -Po
syntax:
$ echo \"someletters_12345_moreleters.ext\" | grep -Po \'(?<=_)\\d+\'
12345
Or if you want to make it fit exactly 5 characters:
$ echo \"someletters_12345_moreleters.ext\" | grep -Po \'(?<=_)\\d{5}\'
12345
Finally, to make it be stored in a variable it is just need to use the var=$(command)
syntax.
回答9:
Without any sub-processes you can:
shopt -s extglob
front=${input%%_+([a-zA-Z]).*}
digits=${front##+([a-zA-Z])_}
A very small variant of this will also work in ksh93.
回答10:
If we focus in the concept of:
\"A run of (one or several) digits\"
We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:
name=\'someletters_12345_moreleters.ext\'
echo $name | sed \'s/[^0-9]*//g\' # 12345
echo $name | tr -c -d 0-9 # 12345
But if $name contains several runs of numbers, the above will fail:
If \"name=someletters_12345_moreleters_323_end.ext\", then:
echo $name | sed \'s/[^0-9]*//g\' # 12345323
echo $name | tr -c -d 0-9 # 12345323
We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:
echo $name | sed \'s/[^0-9]*\\([0-9]\\{1,\\}\\).*$/\\1/\'
perl -e \'my $name=\'$name\';my ($num)=$name=~/(\\d+)/;print \"$num\\n\";\'
But we could as well do it directly in bash(1) :
regex=[^0-9]*([0-9]{1,}).*$; \\
[[ $name =~ $regex ]] && echo ${BASH_REMATCH[1]}
This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.
Note: regex=[^0-9]*([0-9]{5,5}).*$;
will match only exactly 5 digit runs. :-)
(1): faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.
回答11:
Here\'s a prefix-suffix solution (similar to the solutions given by JB and Darron) that matches the first block of digits and does not depend on the surrounding underscores:
str=\'someletters_12345_morele34ters.ext\'
s1=\"${str#\"${str%%[[:digit:]]*}\"}\" # strip off non-digit prefix from str
s2=\"${s1%%[^[:digit:]]*}\" # strip off non-digit suffix from s1
echo \"$s2\" # 12345
回答12:
Here\'s how i\'d do it:
FN=someletters_12345_moreleters.ext
[[ $FN =~ _([[:digit:]]{5})_ ]] && NUM=${BASH_REMATCH[1]}
Note: the above is a regular expression and is restricted to your specific scenario of five digits surrounded by underscores. Change the regular expression if you need different matching.
回答13:
I love sed
\'s capability to deal with regex groups:
> var=\"someletters_12345_moreletters.ext\"
> digits=$( echo $var | sed \"s/.*_\\([0-9]\\+\\).*/\\1/p\" -n )
> echo $digits
12345
A slightly more general option would be not to assume that you have an underscore _
marking the start of your digits sequence, hence for instance stripping off all non-numbers you get before your sequence: s/[^0-9]\\+\\([0-9]\\+\\).*/\\1/p
.
> man sed | grep s/regexp/replacement -A 2
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to
refer to that portion of the pattern space which matched, and the special escapes \\1 through \\9 to refer to the corresponding matching sub-expressions in the regexp.
More on this, in case you\'re not too confident with regexps:
s
is for _s_ubstitute
[0-9]+
matches 1+ digits
\\1
links to the group n.1 of the regex output (group 0 is the whole match, group 1 is the match within parentheses in this case)
p
flag is for _p_rinting
All escapes \\
are there to make sed
\'s regexp processing work.
回答14:
Given test.txt is a file containing \"ABCDEFGHIJKLMNOPQRSTUVWXYZ\"
cut -b19-20 test.txt > test1.txt # This will extract chars 19 & 20 \"ST\"
while read -r; do;
> x=$REPLY
> done < test1.txt
echo $x
ST
回答15:
similar to substr(\'abcdefg\', 2-1, 3) in php:
echo \'abcdefg\'|tail -c +2|head -c 3
回答16:
My answer will have more control on what you want out of your string. Here is the code on how you can extract 12345
out of your string
str=\"someletters_12345_moreleters.ext\"
str=${str#*_}
str=${str%_more*}
echo $str
This will be more efficient if you want to extract something that has any chars like abc
or any special characters like _
or -
. For example: If your string is like this and you want everything that is after someletters_
and before _moreleters.ext
:
str=\"someletters_123-45-24a&13b-1_moreleters.ext\"
With my code you can mention what exactly you want.
Explanation:
#*
It will remove the preceding string including the matching key. Here the key we mentioned is _
%
It will remove the following string including the matching key. Here the key we mentioned is \'_more*\'
Do some experiments yourself and you would find this interesting.
回答17:
There\'s also the bash builtin \'expr\' command:
INPUT=\"someletters_12345_moreleters.ext\"
SUBSTRING=`expr match \"$INPUT\" \'.*_\\([[:digit:]]*\\)_.*\' `
echo $SUBSTRING
回答18:
Ok, here goes pure Parameter Substitution with an empty string. Caveat is that I have defined someletters and moreletters as only characters. If they are alphanumeric, this will not work as it is.
filename=someletters_12345_moreletters.ext
substring=${filename//@(+([a-z])_|_+([a-z]).*)}
echo $substring
12345
回答19:
A little late, but I just ran across this problem and found the following:
host:/tmp$ asd=someletters_12345_moreleters.ext
host:/tmp$ echo `expr $asd : \'.*_\\(.*\\)_\'`
12345
host:/tmp$
I used it to get millisecond resolution on an embedded system that does not have %N for date:
set `grep \"now at\" /proc/timer_list`
nano=$3
fraction=`expr $nano : \'.*\\(...\\)......\'`
$debug nano is $nano, fraction is $fraction
回答20:
A bash solution:
IFS=\"_\" read -r x digs x <<<\'someletters_12345_moreleters.ext\'
This will clobber a variable called x
. The var x
could be changed to the var _
.
input=\'someletters_12345_moreleters.ext\'
IFS=\"_\" read -r _ digs _ <<<\"$input\"