I have a string containing utf-8 encoded text. I need to remove the last utf-8 character.
So far I did
msg = msg[:-1]
but this only removes the last byte. It works as long as the last character is an ASCII code. It doesn't work anymore when the last character is a multibyte character.
The simplest way is to decode your UTF-8 bytes to Unicode text:
You can always encode it again.
The alternative would be for you to search for a UTF-8 start byte; UTF-8 byte sequences always start with a byte with the most significant bit set to
0
, or the two most significant bits set to1
, while continuation bytes always start with10
: