regex substitute several special characters with o

2019-08-18 04:02发布

问题:

The character ̈ (unicode 0x308) cannot be represented in the “Western (ISO Latin 9)” encoding.

I need to replace several (3) of this special characters in many txt-files. Ideal would be one single regex command for the TEXTWRANGLER editor application I run on my Mac so I can use in the find&replace function of Textwrangler (similar to BBedit).

Here are the 3 special chars:

  1. ä into ä
  2. ö into ö
  3. ü into ü

(please note the first letter persists of two chars (e.g. the a and the ̈ unicode 0x308) and therefore it is not WESTERN ISO LATIN compatibel.

I tried regex (groups) but I was not successfull: In TEXTWRANGLER I use the find&replace function (incl. grep=regex option)

FIND: (ä|ö|ü)+

REPLACE: \1ä , \2ö , \3ü

any idea?

回答1:

Brief

I've just tested this with Notepad++, although I'm not sure if this will work in any Mac text editor alternatives.

This method is a conditional replacement using a dictionary in regex. It's more of a hack, but it does work assuming it's supported by the text editor. Once you're done remove the dictionary from the bottom of the file.


Code

See regex in use here

(ä|ö|ü)(?=[\s\S]*Dictionary:[\s\S]*\1=([^\s=:]+))

Replacement

\2

Results

Input

ä into a
ö into o
ü into u

Input - Modified

This input includes the dictionary at the end

ä into a
ö into o
ü into u

Dictionary:
ä=a
ö=o
ü=u

Output

a into a
o into o
u into u

Dictionary:
ä=a
ö=o
ü=u

Explanation

  • (ä|ö|ü) Capture either character in the group into capture group 1
  • (?=[\s\S]*Dictionary:[\s\S]*\1=([^\s=:]+)) Positive lookahead ensuring what follows matches
    • [\s\S]* Match any character any number of times
    • Dictionary: Match Dictionary: literally (this can be changed to anything, but you should make sure this is a unique string that won't be present anywhere else in your input)
    • [\s\S]* Match any character any number of times
    • \1 Match the same text as most recently matched by the first capture group
    • = Match the equal sign character = literally
    • ([^\s=:]+) Capture one or more of any character not present in the set (not whitespace, = or :) into capture group 2