I have a big HTML file that has lots of markup that looks like this:
<p class="MsoNormal" style="margin: 0in 0in 0pt;">
<span style="font-size: small; font-family: Times New Roman;">stuff here</span>
</p>
I'm trying to do a Vim search-and-replace to get rid of all class=""
and style=""
but I'm having trouble making the match ungreedy.
My first attempt was this
%s/style=".*?"//g
but Vim doesn't seem to like the ?
. Unfortunately removing the ?
makes the match too greedy.
How can I make my match ungreedy?
Instead of .*
use .\{-}
.
%s/style=".\{-}"//g
Also, see :help non-greedy
Non greedy search in vim is done using {-} operator. Like this:
%s/style=".\{-}"//g
just try:
:help non-greedy
If you're more comfortable PCRE regex syntax, which
- supports the non-greedy operator ?, as you asked in OP; and
- doesn't require backwhacking grouping and cardinality operators (an utterly counterintuitive vim syntax requirement since you're not matching literal characters but specifying operators); and
you have [g]vim compiled with perl feature, test using
:ver and inspect features; if +perl is there you're good to go)
try search/replace using
:perldo s///
Example. Swap src and alt attributes in img tag:
<p class="logo"><a href="/"><img src="/caminoglobal_en/includes/themes/camino/images/header_logo.png" alt=""></a></p>
:perldo s/(src=".*?")\s+(alt=".*?")/$2 $1/
<p class="logo"><a href="/"><img alt="" src="/caminoglobal_en/includes/themes/camino/images/header_logo.png"></a></p>
I've found that a good solution to this type of question is:
:%!sed ...
(or perl if you prefer). IOW, rather than learning vim's regex peculiarities, use a tool you already know. Using perl would make the ? modifier work to ungreedy the match.
With \v
(as suggested in several comments)
:%s/\v(style|class)\=".{-}"//g
Plugin eregex.vim handles Perl-style non-greedy operators *?
and +?
G'day,
Vim's regexp processing is not too brilliant. I've found that the regexp syntax for sed is about the right match for vim's capabilities.
I usually set the search highlighting on (:set hlsearch) and then play with the regexp after entering a slash to enter search mode.
Edit: Mark, that trick to minimise greedy matching is also covered in Dale Dougherty's excellent book "Sed & Awk" (sanitised Amazon link).
Chapter Three "Understanding Regular Expression Syntax" is an excellent intro to the more primitive regexp capabilities involved with sed and awk. Only a short read and highly recommended.
HTH
cheers,