How to fix broken lines with numbers in middle in

2019-07-14 14:01发布

问题:

There are some broken lines in an ORC table of contents, which may or may not have number after \t and before \n.

Input:

    9.1 The Euclidean Group in Two-Dimensional  152
    Space E2
CHAPTER 10: THE LORENTZ AND POINCARÉ GROUPS,    
    AND SPACE-TIME SYMMETRIES   173

If a number is sandwiched between two letters (152 in the example) then it is the page number of the previous section and should be deleted. If after it is another number (number of the next section) then it is the correct page number (173 here) and should be kept. Here's the desired output:

    9.1 The Euclidean Group in Two-Dimensional Space E2
CHAPTER 10: THE LORENTZ AND POINCARÉ GROUPS, AND SPACE-TIME SYMMETRIES  173

My try:

([a-zA-Z])(\t[0-9]*\n\t)((?![P])[A-Z])

but npp keeps saying it can't find the text, even though it works fine in https://www.regextester.com. How to fix them to normal?

回答1:

You may use

(\S)\t[0-9]*\R\t+

and replace with $1 (Group 1 value placeholder).

Details

  • (\S) - Group 1: any non-whitespace char
  • \t - a tab
  • [0-9]* - 0+ digits
  • \R - a line break sequence
  • \t+ - 1 or more tabs (or \h+ - 1+ horizontal whitespaces)

REGEX DEMO