Multiple lazy string replacing between two pattern

Example:

This (word1) is a test (word2) file.

What I want:

This is a test file.

The problem is that the brackets occur more than once, so if I use:

sed 's/<.*>//g'

I get This file which it's wrong.

How about if I want to replace the string between two same patterns?

Like:

WORD1 %WORD2% WORD3 => WORD1 WORD3

标签： regex bash shell sed

1条回答

何必那么认真

2楼-- · 2019-02-25 00:02

All you need is a negated character class [^<>]* that will match any character but a < or >:

sed 's/<[^<>]*>//g'

Or, if you have round brackets you can use [^()]* (note that in BRE syntax, to match a literal ( or ) escaping \ is not necessary):

sed 's/([^()]*)//g'

See IDEONE demo

As for the update, you can remove everything from WORD1 till WORD3 using .*, but only if there is only one set of WORD1 and WORD3 (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/WORD1.*WORD3/WORD1 WORD3/g'

With sed, it is not possible to use lookarounds (lookaheads here), nor lazy quantifiers to restrict the match to the leftmost WORD3 occurrences. And if you know for sure there is no % symbol in between, you can still use the negated character class approach (demo):

echo "WORD1 %WORD2% WORD3" | sed 's/%[^%]*%//g'

A generic solution is to do it in several steps:

replace the starting and ending delimiters with unused character (<UC>) (I am using Russian letters, but it should be some control character)
use the negated character class <UC1>[^<UC1><UC2>]*<UC2> to replace with the necessary replacement string
restore the initial delimiters.

Here is an example:

#!/bin/bash
echo "WORD1 %WORD2% WORD3 some text WORD1 %WORD2% WORD3" | 
  sed 's/WORD1/й/g' |
  sed 's/WORD3/ч/g' |
  sed 's/й[^йч]*ч/й ч/g' |
  sed 's/й/WORD1/g' |
  sed 's/ч/WORD3/g' 
 // => WORD1 WORD3 some text WORD1 WORD3

I am hardcoding a space, but it can be adjusted whenever necessary.

0人赞添加讨论(0) 举报

Multiple lazy string replacing between two pattern

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间