I've a collection of strings like that (each "space" is a tabulation):
29 301 3 31 0 TREZILIDE Trézilidé
2A 001 1 73 1 (LE) AFA (Le) Afa
What I want is to transform it into this:
29301 Trézilidé
2A001 (Le) Afa
- Suppression of the first tabulation
- suppression of the tabulations, numbers and the first uppercase occurrence (and replacement of the whole stuff by a space)
- replacement of the last tabulation by a space
My bigger problems are:
- How to select the first tabulation without selecting the "prefix" and the "suffix"? (like
^(..)\t[0-9]
but without selecting ^(..)
nor [0-9]
)
- How to select from after the 3 digits to after the tabulation of the uppercase word?
I do that in a text file with the search and replace toolbox of Notepad++
Thanks in advance for your help!
How to select the first tabulation without selecting the "prefix" and the "suffix"?
Optimally this is done using lookahead and lookbehind assertions, but Notepad++ doesn't support those before version 6.0. The next best solution is to just capture them, then backreference them in the replacement string.
Here's how I did it (in answer to your full question):
Check Match case to do a case-sensitive find
Find by regex:
^(..)\t(\d\d\d)[\tA-Z0-9()]+\t(.+)$
Replace with:
\1\2 \3
I end up with this, where <tab>
represents an actual tabulation:
29301 Trézilidé
2A001 (Le)<tab>Afa
To get rid of that I do an extended find:
\t
And replace it with the space character, to obtain the final result:
29301 Trézilidé
2A001 (Le) Afa
Try
^(..)\t
Replace with
\1
Then
\(*[A-Z][A-Z]+\)*
Replace with empty string, removes (LE)
and AFA too.
''
Then
^(.....).*(\t[A-Za-z]+)+$
Replacement:
\1 \2
And finally:
\t
Replace with a space. Every occurence.
HTW