How can I use the textwrap
module to split before a line reaches a certain amount of bytes (without splitting a multi-bytes character)?
I would like something like this:
>>> textwrap.wrap('☺ ☺☺ ☺☺ ☺ ☺ ☺☺ ☺☺', bytewidth=10)
☺ ☺☺
☺☺ ☺
☺ ☺☺
☺☺
How can I use the textwrap
module to split before a line reaches a certain amount of bytes (without splitting a multi-bytes character)?
I would like something like this:
>>> textwrap.wrap('☺ ☺☺ ☺☺ ☺ ☺ ☺☺ ☺☺', bytewidth=10)
☺ ☺☺
☺☺ ☺
☺ ☺☺
☺☺
The result depends on the encoding used, because the number of bytes per character is a function of the encoding, and in many encodings, of the character as well. I'll assume we're using UTF-8, in which
'☺'
is encoded ase298ba
and is three bytes long; the given example is consistent with that assumption.Everything in
textwrap
works on characters; it doesn't know anything about encodings. One way around this is to convert the input string to another format, with each character becoming a string of characters whose length is proportional to the byte length. I will use three characters: two for the byte in hex, plus one to control line breaking. Thus:For simplicity I'll assume we only break on spaces, not tabs or any other character.
I ended up rewriting a part of
textwrap
to encode words after it split the string.Unlike Tom's solution, the Python code does not need to iterate through every character.