I'm building a telnet app in C# (for scripting door games on oldschool BBS systems e.g. Wildcat) and can't seem to build a working parser for ANSI escape codes (e.g. cursor movement, colorizing, etc) - almost all systems I've tested send undefined sequences which defy any "standards". There also seem to be very few resources on the matter, Wikipedia has the most in-depth list I've found so far but even they say it's incomplete - and most other sites I've encountered just copy/pasted Wikipedia's article.
My question: Is there a library out there? If not, how about some parsing code/Regex? At the very least some proper documentation for things like ESC[!_
would be incredibly helpful.
I really feel like I'm reinventing the wheel on this, especially seeing as Telnet is more or less the Internet's equivalent of the wheel (at least age-wise ;)
EDIT: Added an example of weirdness:
00000075h: 1B 5B 73 1B 5B 32 35 35 42 1B 5B 32 35 35 43 08 ; .[s.[255B.[255C.
00000085h: 5F 1B 5B 36 6E 1B 5B 75 1B 5B 21 5F 02 02 3F 48 ; _.[6n.[u.[!_..?H
00000095h: 54 4D 4C 3F 1B 5B 30 6D 5F 1B 5B 32 4A 1B 5B 48 ; TML?.[0m_.[2J.[H
000000a5h: 0C 0D 0A ; ...
The mysterious part is '21' in line 2 ---^^
A proper answer depends on how one intends to use the library. Any terminal emulator will read those sequences and perform actions based on them. But even a simple terminal emulator will understand about a hundred sequences.
Your example, in a perhaps more readable form, looks like this:
using
unmap
(making the escape character\E
and showing all characters printable — and beginning a new line for escape characters).ECMA-48 describes the format for
Control sequences have content (parameters) which are limited to certain characters such as digits and separators, e.g.,
';'
. Control sequences also have a definite ending, called the final character. The sequence\E[!_^B^B?
does not follow those rules. As suggested in a comment, perhaps your recording was confused by the terminal's response to the cursor position request\E[6n
.With that much context:
\E[2J
clears the display)\E[6n
asks the terminal where the cursor is)\E[s
and\E[u
save the cursor position and restore it later)In short, you may see that to process the control sequences received by a terminal, you really need a terminal program to do all of this. Not all terminal emulators are the same, however. Some use a series of case-statements, to handle the successive stages of escape, bracket, digits, etc. But your program should keep in mind that single-byte controls can appear in the middle of multi-byte control sequences. Since they are encoded differently, there is no conflict. But it makes the program more complicated than you might suppose for just reading one sequence at a time.
xterm uses some case-statements (for the final character, basically), but most of the state transitions in decoding a control sequence are done using a set of tables. They are very repetitive, but not obvious to construct: Paul Williams pointed out that for a VT100, those should be symmetric (essentially treating the input as 7-bit ASCII). Some of the states are treated as errors, and ignored; well-formatted sequences are all that matters anyway. In theory, you could reuse the state-tables and add a "little" parsing. The tables are 8500 lines (one state per line).
Aside from (a) reading existing terminal emulators and imitating them on a smaller scale, or (b) modifying a terminal emulator ... you could investigate
libvterm
:However, that is not in
C#
(and the source is the documentation). Still, it is only 5500 lines of code.Further reading: