This is my string:
'ls\r\n\x1b[00m\x1b[01;31mexamplefile.zip\x1b[00m\r\n\x1b[01;31m'
I was using code to retrieve the output from a SSH command and I want my string to only contain 'examplefile.zip'
What I can use to remove the extra escape sequences?
Delete them with a regular expression:
Demo:
(I've tidied up the escape sequence expression to follow the Wikipedia overview of ANSI escape codes, focusing on the CSI sequences, and ignoring the C1 codes as they are never used in today's UTF-8 world).
The suggested regex didn't do the trick for me so I created one of my own. The following is a python regex that I created based on the spec found here
I tested my regex on the following snippet (basically a copy paste from the ascii-table.com page)
Hopefully this will help others :)
The accepted answer to this question only considers color and font effects. There are a lot of sequences that do not end in 'm', such as cursor positioning, erasing, and scroll regions.
The complete regexp for Control Sequences (aka ANSI Escape Sequences) is
Refer to ECMA-48 Section 5.4 and ANSI escape code
if you want to remove the
\r\n
bit, you can pass the string through this function (written by sarnold):Careful though, this will lump together the text in front and behind the escape sequences. So, using Martijn's filtered string
'ls\r\nexamplefile.zip\r\n'
, you will getlsexamplefile.zip
. Note thels
in front of the desired filename.I would use the stripEscape function first to remove the escape sequences, then pass the output to Martijn's regular expression, which would avoid concatenating the unwanted bit.
Function
Based on Martijn Pieters♦'s answer with Jeff's regexp.
Test
Testing
If you want to run it by yourself, use
python3
(better unicode support, blablabla). Here is how the test file should be: