Regex replace pattern control characters in escape

2019-08-11 02:32发布

问题:

I have an escaped string that contains certain control characters. The control characters are the ACK, STX types. Reference: http://ascii.cl/control-characters.htm

I need to replace all the control characters, preferably all consecutive control characters with ~.

Ex. Input

%00%00%00%02THE%20QUICK%BROWN%00%00%00%0D%00%00%00%0FFOX%20JUMPED%00%00%00%0EOVER%20THE%00%00%4E%02LAZY%20DOG

My desired output should be:

~THE%20QUICK%20BROWN~FOX%20JUMPED~OVER%20THE~LAZY%20DOG

For the sake of myself and others the method i look for is to replace a pattern which in this case would be something like %0?%0?%0?%0?? Meaning anything that could creep into the text.

The string pattern

  1. String should be of length 12

  2. String should contain 4 percentage zero symbols ex. %0

I am open to other suggestions as well.

Intention is to get rid of all control characters from the string. Replacing with ~ is just to keep a tab on what got replaced where (debugging).

回答1:

Try this expression:

(%[0-13-9A-F][0-9A-F])+

It finds all sequences of control chars repeated, except for %20.

With it I get this output:

~THE%20QUICK%BROWN~FOX%20JUMPED~OVER%20THE~LAZY%20DOG


回答2:

You could come up with sth. like:

(%[0-9A-F]{2})
# match a %,
# followed by 0-9, A-F two times

Depending on your programming language (not specified?), match all and replace the capture group $1 with "~". Your string would then become:

~~~~THE~QUICK%BROWN~~~~~~~~FOX~JUMPED~~~~OVER~THE~~~~LAZY~DOG 

See a demo on regex101.com



回答3:

When you say all control characters, you might want to be aware of the below quote.

Control characters don't produce output as such, but instead usually control the terminal somehow: for example, newline and backspace are control characters. On ASCII platforms, in the ASCII range, characters whose code points are between 0 and 31 inclusive, plus 127 (DEL ) are control characters; on EBCDIC platforms, their counterparts are control characters.

You seem to be considering %4E as a control character which corresponds to letter N

Also, you have the letters %BROWN in your input; I believe you wanted it to be %20BROWN

If that fits your requirements, then the below regex should work for you

(?:%(?:(?:[0-1][0-9A-F])|7F))+

Make sure that you repeatedly replace this pattern with ~. Also, you might want a case insensitive match

English breakdown of it:

Match anything that has a percent sign followed by any number up to 1F or the number 7F

Below is the perl implementation of it

$s = q(%00%00%00%02THE%20QUICK%20BROWN%00%00%00%0D%00%00%00%0FFOX%20JUMPED%00%00%00%0EOVER%20THE%00%00%4E%02LAZY%20DOG);
$s =~ s/(?:%(?:(?:[0-1][0-9A-F])|7F))+/~/gi;
print $s;
# output : ~THE%20QUICK%20BROWN~FOX%20JUMPED~OVER%20THE~%4E~LAZY%20DOG