Given a filename in the form someletters_12345_moreleters.ext
, I want to extract the 5 digits and put them into a variable.
So to emphasize the point, I have a filename with x number of characters then a five digit sequence surrounded by a single underscore on either side then another set of x number of characters. I want to take the 5 digit number and put that into a variable.
I am very interested in the number of different ways that this can be accomplished.
In case someone wants more rigorous information, you can also search it in man bash like this
Result:
similar to substr('abcdefg', 2-1, 3) in php:
If x is constant, the following parameter expansion performs substring extraction:
where 12 is the offset (zero-based) and 5 is the length
If the underscores around the digits are the only ones in the input, you can strip off the prefix and suffix (respectively) in two steps:
If there are other underscores, it's probably feasible anyway, albeit more tricky. If anyone knows how to perform both expansions in a single expression, I'd like to know too.
Both solutions presented are pure bash, with no process spawning involved, hence very fast.
If we focus in the concept of:
"A run of (one or several) digits"
We could use several external tools to extract the numbers.
We could quite easily erase all other characters, either sed or tr:
But if $name contains several runs of numbers, the above will fail:
If "name=someletters_12345_moreleters_323_end.ext", then:
We need to use regular expresions (regex).
To select only the first run (12345 not 323) in sed and perl:
But we could as well do it directly in bash(1) :
This allows us to extract the FIRST run of digits of any length
surrounded by any other text/characters.
Note:
regex=[^0-9]*([0-9]{5,5}).*$;
will match only exactly 5 digit runs. :-)(1): faster than calling an external tool for each short texts. Not faster than doing all processing inside sed or awk for large files.
Here's how i'd do it:
Note: the above is a regular expression and is restricted to your specific scenario of five digits surrounded by underscores. Change the regular expression if you need different matching.
A bash solution:
This will clobber a variable called
x
. The varx
could be changed to the var_
.