Unicode characters not rendering with PIL ImageFon

2020-02-16 03:55发布

问题:

I'm trying to write tiff images using box drawing characters, but all of the characters in question show up as:

The box draw characters (e.g. "┌─┐│└┘╞═╡╤╧╘╛") were pasted directly into the source code, and they show up correctly when saved to a text file, but I don't understand why they're not showing up on the image.

Here is an example of the code I'm using to draw the image:

# coding=utf-8
text = "┌─┐│└┘╞═╡╤╧╘╛"
from PIL import Image, ImageDraw, ImageFont, TiffImagePlugin
img = Image.new("1",(1200,1600),1)
font = ImageFont.truetype("cour.ttf",14,encoding="unic")
draw = ImageDraw.Draw(img)
draw.text((40,0), text, font=font, fill=0)
img.save("imagefile.tif","TIFF")

I'm using python version 2.7.2 on Windows 7.

回答1:

I'm not sure which of these is your problem, because there are multiple ways you can get this, so I'll go over all of the possibilities:

First, make sure the file is actually saved as UTF-8. By default, Notepad, and many other editors, will save files in your system encoding, which is probably something like cp1252. Testing that "it looks right" and "when the script writes those characters to a file and I open that file in Notepad, it looks right" doesn't tell you anything; obviously if you save a cp1252 file and open it as cp1252, it looks right.

Just adding "coding=utf-8" to the top doesn't magically change how the file is saved (except with a few smart editors, like emacs). It just tells Python that "this source file is UTF-8", even if it's really something else. So, Python ends up interpreting your cp1252 as UTF-8 and getting mojibake, like an a-with-circumflex in place of a line-drawing character.

You're usually better off using explicit backslash escapes, like \u250c instead of ┌─, especially if you don't even know how to tell if the file is UTF-8, much less how to fix it.

Second, you almost never want to put non-ASCII characters into a str literal; use a unicode literal unless you have a good reason to do otherwise.

On top of that, if you pass draw.text a str, PIL will decode it with your default charset—which again is probably not UTF-8. So, even if everything else so far were correct, your code would be handing over some UTF-8 to be parsed as cp1252, so mojibake again. Using a unicode literal would avoid this problem entirely; otherwise, you need to pass text.decode('utf-8').

Putting that all together:

text = u"\u250c\u2500\u2510\u2502\u2514\u2518\u255e\u2550\u2561\u2564\u2567\u2558\u255b"

And now the coding declaration and the actual encoding used to save the file don't matter, because the file is pure ASCII.

But you may still get the missing-character rectangles, because many fonts don't have the line-drawing characters. I don't know what your cour.ttf is, but I found two Courier TTF fonts on my system, one from an old Mac OS and one from Windows XP, and neither one has them. If that's your problem you obviously need to use a different font.

One other possibility: If you're still getting mojibake with the fixes above, cour.ttf may not be a Unicode-ordered font file, but one of the older TTF orders. A font viewer should show you the TTF order of the file. (I'm pretty sure Windows comes with one, but I have no idea where it is in Windows 7 or how to use it.) Then you need to pass the right thing in place of 'unic' as the encoding when loading the font. But most fonts that aren't either unic or symb probably won't have the line-drawing characters anyway.