I need to get a specific string from a bigger string:
From these Abcd1234_Tot9012_tore.dr
or Abcd1234_Tot9012.tore.dr
I want to get those numbers which are between Tot
and _
or .
, so I should get 9012
. Important thing is that the number of characters before and after these numbers may vary.
Could anyone give me a nice solution for this? Thanks in advance!
This should also work if you are looking only for numbers after Tot
[srikanth@myhost ~]$ echo "Abcd1234_Tot9012_tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
[srikanth@myhost ~]$ echo "Abcd1234_Tot9012.tore.dr" | awk ' { match($0,/Tot([0-9]*)/,a); print a[1]; } '
9012
I know this is tagged as bash/sed but perl is clearer for this kind of task, in my opinion. In case you're interested:
perl -ne 'print $1 if /Tot([0-9]+)[._]/' input.txt
-ne
tells perl to loop the specified one-liner over the input file without printing anything by default.
The regex is readable as: match Tot, followed by a number, followed by either a dot or an underscore; capture the number (that's what the parens are for). As it's the first/capture group it's assigned to the $1
variable, which then is printed.
Pure Bash:
string="Abcd1234_Tot9012_tore.dr" # or ".tore.dr"
string=${string##*_Tot}
string=${string%%[_.]*}
echo "$string"
Remove longest leading part ending with '_Tot'.
Remove longest trailing part beginning with '_' or '.'.
Result:
9012
awk
string="Abcd1234_Tot9012_tore.dr"
num=$(awk -F'Tot|[._]' '{print $3}' <<<"$string")
sed
string="Abcd1234_Tot9012_tore.dr"
num=$(sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string")
Example
$ string="Abcd1234_Tot9012_tore.dr"; awk -F'Tot|[._]' '{print $3}' <<<"$string"
9012
$ string="Abcd1234_Tot9013.tore.dr"; sed 's/.*\([0-9]\{4\}\).*$/\1/' <<<"$string"
9013
You can use perl
one-liner:
perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
Test:
[jaypal:~/Temp] cat file
Abcd1234_Tot9012_tore.dr
Abcd1234_Tot9012.tore.dr
[jaypal:~/Temp] perl -pe 's/.*(?<=Tot)([0-9]{4}).*/\1/' file
9012
9012
Using grep
you can do:
str=Abcd1234_Tot9012.tore.dr; grep -o "Tot[0-9]*" <<< $str|grep -o "[0-9]*$"
OUTPUT:
9012
This might work for you:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/Tot[^0-9]*\([0-9]*\)[_.].*/\n\1/;s/.*\n//'
9012
9012
This works equally as well:
echo -e "Abcd1234_Tot9012_tore.dr\nAbcd1234_Tot9012.tore.dr" |
sed 's/.*Tot\([0-9]*\).*/\1/'
9012
9012