I'd like to remove all attributes of <p>
in an HTML file by using this simple Perl command line:
$ perl -pe 's/<p[^>]*>/<p>/' input.html
However, it won't substitute e.g. <p class="hello">
that spans multiple lines such as
<p
class="hello">
Thus, I attempted to first remove the end of line by doing
# command-1
$ perl -pe 's/\n/ /' input.html > input-tmp.html
# command-2
$ perl -pe 's/<p[^>]*>/<p>/g' input-tmp.html > input-final.html
Questions:
- Is there an option in (Perl) regex to try the match across multiple lines?
- Can I combine the two commands above (command-1 and command-2) into one? Basically, the first command needs to complete execution before the second one starts.
-p
is short forAs you can see
$_
only contains one line at a times, so the pattern can't possibly match something that spans more than one line. You can fool Perl into thinking the whole file is one line using-0777
.Command line options are documented in perlrun.
If you write a short script, and put it in its own file, you can easily invoke it using a simple command line.
Improving the following script is left as an exercise:
perl -pe 'undef $/; s/<p[^>]*>/<p>/g'