I have a sed command that is working fine, except when it comes across a newline right in the file somewhere. Here is my command:
sed -i 's,<a href="\(.*\)">\(.*\)</a>,\2 - \1,g'
Now, it works perfectly, but I just ran across this file that has the a
tag like so:
<a href="link">Click
here now</a>
Of course it didn't find this one. So I need to modify it somehow to allow for lines breaks in the search. But I have no clue how to make it allow for that unless I go over the entire file first off and remove all \n
before hand. Problem there is I loose all formatting in the file.
Here is a quick and dirty solution that assumes there will be no more than one newline in a link:
The first command (
/<a href=.*>/{/<\/a>/!{N;s|\n||;};}
) checks for the presence of<a href=...>
without</a>
, in which case it reads the next line into the pattern space and removes the newline. The second is yours.You can do this by inserting a loop into your sed script:
As-is, that will leave an embedded newline in the output, and it wasn't clear if you wanted it that way or not. If not, just substitute out the newline:
And maybe clean up extra spaces:
Explanation: The
/<a href/{...}
lets us ignore lines we don't care about. Once we find one we like, we check to see if it has the end marker. If not (/<\a>/!
) we grab the next line and a newline (N) and branch (b) back to :next to see if we've found it yet. Once we find it we continue on with the substitutions.