Remove occurences of small words within a string

2019-03-06 04:57发布

I'm trying to remove a specific word from a string. I can't do a simple global string replace for "the" to empty string as "the" could be part of a word in the string.

word: "the"
string: "the_ad_an_feta_cfr_era_the_iop_the"
output: "ad_an_feta_cfr_era_iop"

The word "the" could be at the beginning, several times in the middle or at the end of the string so I have to take into account the separator and beginning/end of string.

Could I handle all this with one regex or should I resort to looping, but how do I specify the multiple patterns in sed?

sed 's/the//g' <<< "the_ad_feta_cfr_era_the_iop_the"

Then how would I do it if I had several words I wanted to remove from the same string? Instead of only "the" also remove "is", "an". Can all this be one in regex without looping?

word: "the", "an", "is"
input: "the_ad_an_feta_cfr_era_the_iop_the"
output: "ad_feta_cfr_era_iop"

标签: bash shell
1条回答
我只想做你的唯一
2楼-- · 2019-03-06 05:04

Take a look at this sed:

$ string='the_ad_an_feta_cfr_era_the_iop_the'
$ sed -E -e ':a' -e 's/(^|_)(the|an|is|feta)(_|$)/\1/g;ta' -e 's/_$//' <<< "$string"
ad_cfr_era_iop

Note that the behavior of sed differs between Unix variants. Your sed seems to require newlines after labels (or multiple -e options). Further reading:


Version without labels which is essentially the same as @Cyrus' answer but supports "items" with spaces:

$ string='the_ad_an_feta_cfr_era_the cfr_the_iop_the'
$ sed -E -e 's/_/__/g;s/(^|_)(the|an|is|feta)(_|$)//g;s/_+/_/g;s/^_//;s/_$//' <<< "$string"
ad_cfr_era_the cfr_iop
查看更多
登录 后发表回答