Extract pattern between a substring and first occu

Following is the content of a file:

xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

I want to extract component names component1 component2 etc.

This is what I tried:

for line in `sed -n -e '/^xxx-/p' $file`
do
    comp=`echo $line | sed  -e '/xxx-/,/[0-9]/p'`
    echo "comp - $comp"
done

I also tried sed -e 's/.*xxx-\(.*\)[^0-9].*/\1/'

This is based on some info on net. Please give me sed command and if possible also explain stepwise

Part 2. I also need to extract version number from the string. version number starts with digit and ends with . followed by xc-linux. As you can see to maintain the uniqueness its has random alphanumeric characters ( length is 7) as part of the version number.

For example : xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r In this string the version number is : 1.0-2-2acd314

标签： sed matching substring

1条回答

神经病院院长

2楼-- · 2019-04-09 02:13

There are quite a few ways to extract the data. The simplest form would be grep.

GNU `grep`:

You can grab the required data using GNU grep with PCRE option -P:

$ cat file
xxx_component1-1.0-2-2acd314.xc-linux-x86-64-Release-devel.r
xxx_component2-3.0-1-fg3sdhd.xc-linux-x86-64-Release-devel.r
xxx_component3-1.0-2-3gsjcgd.xc-linux-x86-64-Release-devel.r
xxx_component4-0.0-2-2acd314.xc-linux-x86-64-Release-devel.r

$ grep -oP '(?<=_)[^-]*' file
component1
component2
component3
component4

Here we use negative look behind assertion tell to capture everything from _ to a - not incusive.

`awk`:

$ awk -F"[_-]" '{print $2}' file
component1
component2
component3
component4

Here we tell awk to use - and _ as delimiters and print the second column.

`sed`:

Having said that, you can also use sed to extract required data using group capture:

$ sed 's/.*_\([^-]*\)-.*/\1/' file
component1
component2
component3
component4

The regex states that match any character zero or more times up to an _. From that point onwards, capture everything until a - in a group. In the replacement part we just use the data captured in the group by calling it using back reference, that is \1.

0人赞添加讨论(0) 举报