Fellow Forum Members,
I'm using NotePad++ version 6.1.2 and I need to know if it is possible to make a General Expression perform a Find & Replace operation that accomplishes the following:
It finds "X" text located in between "Y" and "Z" text and replaces it with nothing to effectively delete both the "X" text and also the "Y" and "Z" text as well. So for the sentence shown below the general expression needs to delete all text between the words "Begin" and "End" and also the words "Begin" and "End" as well to delete everthing.
Begin "X" amount of text End
I should point out that "Begin" and "End" are consistant throughout the text file. Therefore, I need the general expression to find every instance of "Begin" and "End", followed by deleting them and also whatever text is in between. Any help will be appreciated. Thanks.
Press Ctrl+H for the find and replace dialog. In "search mode" at the bottom choose "Regular Expression". Check the box for ". matches newline".
In "Find What" paste the following:
Begin.*?End
In "Replace with" leave blank.
Press "Replace All".
So you want to delete Y, X, Z if and only if X is between Y and Z:
An example with:
Y = "BEGIN"
Z = "END"
X = "CHOUCROUTE"
The pattern:
search : BEGIN(?>[^CE]+|C(?!HOUCROUTE)|E(?!ND))*CHOUCROUTE[\s\S]*?END
replace: nothing
This part (?>[^CE]+|C(?!HOUCROUTE)|E(?!ND))*
is needed to match all except the keyword or the closing word, lets look at it in detail:
(?> # open an atomic group
[^CE]+ # all except the letters C and E
| # OR
C(?!HOUCROUTE) # C not followed by the end of the keyword
| # OR
E(?!ND) # E not followed by the end of the closing word
)* # repeat the group zero or more times
The goal of the atomic group is to avoid catastrophic backtracking. The atomic group forbids the regex engine to backtrack. If I had used a non-capturing group instead and if the regex engine had not found the keyword, it would have tried all possible divisions.
If you use an older version of notepad++ that doesn't have the atomic group feature, you can upgrade your version or emulate it using this trick (the content of a lookahead is atomic by default):
((?=([^CE]+|C(?!HOUCROUTE)|E(?!ND)))\1)*