How to select a string depending on a prefix and a

2019-08-05 16:24发布

问题:

I've a collection of strings like that (each "space" is a tabulation):

29  301 3   31  0       TREZILIDE       Trézilidé
2A  001 1   73  1   (LE)    AFA (Le)    Afa

What I want is to transform it into this:

29301 Trézilidé
2A001 (Le) Afa
  • Suppression of the first tabulation
  • suppression of the tabulations, numbers and the first uppercase occurrence (and replacement of the whole stuff by a space)
  • replacement of the last tabulation by a space

My bigger problems are:

  • How to select the first tabulation without selecting the "prefix" and the "suffix"? (like ^(..)\t[0-9] but without selecting ^(..) nor [0-9])
  • How to select from after the 3 digits to after the tabulation of the uppercase word?

I do that in a text file with the search and replace toolbox of Notepad++

Thanks in advance for your help!

回答1:

How to select the first tabulation without selecting the "prefix" and the "suffix"?

Optimally this is done using lookahead and lookbehind assertions, but Notepad++ doesn't support those before version 6.0. The next best solution is to just capture them, then backreference them in the replacement string.

Here's how I did it (in answer to your full question):

  1. Check Match case to do a case-sensitive find

  2. Find by regex:

    ^(..)\t(\d\d\d)[\tA-Z0-9()]+\t(.+)$
    

    Replace with:

    \1\2 \3
    

    I end up with this, where <tab> represents an actual tabulation:

    29301 Trézilidé
    2A001 (Le)<tab>Afa
    
  3. To get rid of that I do an extended find:

    \t
    

    And replace it with the space character, to obtain the final result:

    29301 Trézilidé
    2A001 (Le) Afa
    


回答2:

Try

^(..)\t

Replace with

\1

Then

\(*[A-Z][A-Z]+\)*

Replace with empty string, removes (LE) and AFA too.

''

Then

^(.....).*(\t[A-Za-z]+)+$

Replacement:

\1 \2

And finally:

\t

Replace with a space. Every occurence.

HTW