There are some broken lines in an ORC table of contents, which may or may not have number after \t
and before \n
.
Input:
9.1 The Euclidean Group in Two-Dimensional 152
Space E2
CHAPTER 10: THE LORENTZ AND POINCARÉ GROUPS,
AND SPACE-TIME SYMMETRIES 173
If a number is sandwiched between two letters (152
in the example) then it is the page number of the previous section and should be deleted. If after it is another number (number of the next section) then it is the correct page number (173
here) and should be kept. Here's the desired output:
9.1 The Euclidean Group in Two-Dimensional Space E2
CHAPTER 10: THE LORENTZ AND POINCARÉ GROUPS, AND SPACE-TIME SYMMETRIES 173
My try:
([a-zA-Z])(\t[0-9]*\n\t)((?![P])[A-Z])
but npp keeps saying it can't find the text, even though it works fine in https://www.regextester.com. How to fix them to normal?
You may use
and replace with
$1
(Group 1 value placeholder).Details
(\S)
- Group 1: any non-whitespace char\t
- a tab[0-9]*
- 0+ digits\R
- a line break sequence\t+
- 1 or more tabs (or\h+
- 1+ horizontal whitespaces)REGEX DEMO