I have an escaped string that contains certain control characters.
The control characters are the ACK
, STX
types.
Reference: http://ascii.cl/control-characters.htm
I need to replace all the control characters, preferably all consecutive control characters with ~
.
Ex. Input
%00%00%00%02THE%20QUICK%BROWN%00%00%00%0D%00%00%00%0FFOX%20JUMPED%00%00%00%0EOVER%20THE%00%00%4E%02LAZY%20DOG
My desired output should be:
~THE%20QUICK%20BROWN~FOX%20JUMPED~OVER%20THE~LAZY%20DOG
For the sake of myself and others the method i look for is to replace a pattern which in this case would be something like %0?%0?%0?%0
?? Meaning anything that could creep into the text.
The string pattern
String should be of length 12
String should contain 4 percentage zero symbols ex. %0
I am open to other suggestions as well.
Intention is to get rid of all control characters from the string. Replacing with ~
is just to keep a tab on what got replaced where (debugging).
Try this expression:
(%[0-13-9A-F][0-9A-F])+
It finds all sequences of control chars repeated, except for %20.
With it I get this output:
~THE%20QUICK%BROWN~FOX%20JUMPED~OVER%20THE~LAZY%20DOG
You could come up with sth. like:
(%[0-9A-F]{2})
# match a %,
# followed by 0-9, A-F two times
Depending on your programming language (not specified?), match all and replace the capture group $1 with "~"
. Your string would then become:
~~~~THE~QUICK%BROWN~~~~~~~~FOX~JUMPED~~~~OVER~THE~~~~LAZY~DOG
See a demo on regex101.com
When you say all control characters, you might want to be aware of the below quote.
Control characters don't produce output as such, but instead usually
control the terminal somehow: for example, newline and backspace are
control characters. On ASCII platforms, in the ASCII range, characters
whose code points are between 0 and 31 inclusive, plus 127 (DEL ) are
control characters; on EBCDIC platforms, their counterparts are
control characters.
You seem to be considering %4E as a control character which corresponds to letter N
Also, you have the letters %BROWN
in your input; I believe you wanted it to be %20BROWN
If that fits your requirements, then the below regex should work for you
(?:%(?:(?:[0-1][0-9A-F])|7F))+
Make sure that you repeatedly replace this pattern with ~
. Also, you might want a case insensitive match
English breakdown of it:
Match anything that has a percent sign followed by any number up to 1F
or the number 7F
Below is the perl implementation of it
$s = q(%00%00%00%02THE%20QUICK%20BROWN%00%00%00%0D%00%00%00%0FFOX%20JUMPED%00%00%00%0EOVER%20THE%00%00%4E%02LAZY%20DOG);
$s =~ s/(?:%(?:(?:[0-1][0-9A-F])|7F))+/~/gi;
print $s;
# output : ~THE%20QUICK%20BROWN~FOX%20JUMPED~OVER%20THE~%4E~LAZY%20DOG