SED replacing with 'possible' newline

I have a sed command that is working fine, except when it comes across a newline right in the file somewhere. Here is my command:

sed -i 's,<a href="\(.*\)">\(.*\)</a>,\2 - \1,g'

Now, it works perfectly, but I just ran across this file that has the a tag like so:

<a href="link">Click
        here now</a>

Of course it didn't find this one. So I need to modify it somehow to allow for lines breaks in the search. But I have no clue how to make it allow for that unless I go over the entire file first off and remove all \n before hand. Problem there is I loose all formatting in the file.

标签： linux sed

2条回答

爷、活的狠高调

2楼-- · 2019-08-16 17:30

Here is a quick and dirty solution that assumes there will be no more than one newline in a link:

sed -i '' -e '/<a href=.*>/{/<\/a>/!{N;s|\n||;};}' -e 's,<a href="\(.*\)">\(.*\)</a>,\2 - \1,g'

The first command (/<a href=.*>/{/<\/a>/!{N;s|\n||;};}) checks for the presence of <a href=...> without </a>, in which case it reads the next line into the pattern space and removes the newline. The second is yours.

0人赞添加讨论(0) 举报

放荡不羁爱自由

3楼-- · 2019-08-16 17:33

You can do this by inserting a loop into your sed script:

sed -e '/<a href/{;:next;/<\/a>/!{N;b next;};s,<a href="\(.*\)">\(.*\)</a>,\2 - \1,g;}' yourfile

As-is, that will leave an embedded newline in the output, and it wasn't clear if you wanted it that way or not. If not, just substitute out the newline:

sed -e '/<a href/{;:next;/<\/a>/!{N;b next;};s/\n//g;s,<a href="\(.*\)">\(.*\)</a>,\2 - \1,g;}' yourfile

And maybe clean up extra spaces:

sed -e '/<a href/{;:next;/<\/a>/!{N;b next;};s/\n//g;s/\s\{2,\}/ /g;s,<a href="\(.*\)">\(.*\)</a>,\2 - \1,g;}' yourfile

Explanation: The /<a href/{...} lets us ignore lines we don't care about. Once we find one we like, we check to see if it has the end marker. If not (/<\a>/!) we grab the next line and a newline (N) and branch (b) back to :next to see if we've found it yet. Once we find it we continue on with the substitutions.

0人赞添加讨论(0) 举报

SED replacing with 'possible' newline

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间