Printing utf-8 strings in Sublime Text 2's con

2019-06-23 02:19发布

问题:

When running this code with python myscript.py from Windows console cmd.exe (i.e. outside of Sublime Text), it works:

# coding: utf8
import json
d = json.loads("""{"mykey": {"readme": "Café"}}""")
print d['mykey']['readme']

Café

When running it inside Sublime Text 2 with CTRL+B, it fails:

  • Either like this (by default):

    print d['mykey']['readme']
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)
    [Finished in 0.1s with exit code 1]

  • or like this, after applying the solution from this answer of printing UTF-8 in Python 3 using Sublime Text 3 (i.e. adding "env": {"PYTHONIOENCODING": "utf8"}, in the build system):

    [Decode error - output not utf-8]
    [Decode error - output not utf-8]
    [Finished in 0.1s]

  • adding "encoding": "utf-8" in the Python Sublime-build file doesn't help either

How to print properly in Sublime Text 2 (for Windows) console, if it contains some UTF8 char?

Note: this is not a duplicate of printing UTF-8 in Python 3 using Sublime Text 3, I already linked to this question before.

Here is the Python.sublime-build file:

{ "cmd": ["python", "-u", "$file"],
"file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
"selector": "source.python",
"variants": [ { "name": "Run", "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)", "cmd": ["C:\\Python27-64\\python.exe", "-u", "$file"] } ] }

(I tried with and without "env": ..., with and without "encoding": ...)

回答1:

This is a long answer full of gory details, but the TL;DR version is that this appears to be a bug in Sublime Text 2 (in particular in it's exec command).

There are instructions below on how to patch Sublime in order to potentially solve the problem (it worked in all of my tests at least) if upgrading to Sublime Text 3 is not an option, as Sublime 3 has an enhanced exec command.


Something to note is that the error you're seeing in the form of:

[Decode error - output not utf-8]

is generated by Sublime as it's adding data to the output panel and not by Python. Even with the fix outlined below, it may still be necessary (based on system setup and/or platform in use) to include the env setting as mentioned in your question, since that tells Python to generate its output in UTF-8 regardless of what it thinks it should do.


For the purposes of the following tests, I installed Sublime Text 2 and Python 2.7.14 on my Windows 7 machine. This machine already has Python 3 installed on it and added to the PATH, so I installed this version into C:\Python27-64 as indicated in your sample build file and left it out of the path.

With the exception of installing PackageResourceViewer and bumping up the default font size, Sublime is otherwise stock.

The test script is the following, slightly modified from the version outlined in your question:

# coding: utf8
import sys

print(sys.version)
print("Café")

Since everything is stock, the Build System in Tools > Build System is set to Automatic, and trying to run the build with Ctrl+B produces the following output:

3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)]
[Decode error - output not utf-8]
[Finished in 0.1s]

This makes sense because as mentioned above Python 3 is on my path but Python 2 is not, and so it it's picking Python 3.

The default Python.sublime-build is the following:

{
    "cmd": ["python", "-u", "$file"],
    "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
    "selector": "source.python"
}

Using PackgeResourceViewer, I opened up the file and modified it to invoke the Python 2 interpreter directly:

{
    "cmd": ["C:\\Python27-64\\python.exe", "-u", "$file"],
    "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
    "selector": "source.python"
}

With this in place, the build results look like this:

2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:25:58) [MSC v.1500 64 bit (AMD64)]
Café
[Finished in 0.1s]

Notice that it's running Python 2, but it's also properly displaying the data now, without having to modify anything.

That's somewhat curious and I must admit I went down a few rabbit holes on this because it seemed to work right off the bat. However, if you comment out the print of sys.version:

# coding: utf8
import sys

#print(sys.version)
print("Café")

It stops working:

[Decode error - output not utf-8]
[Decode error - output not utf-8]
[Finished in 0.1s]

Alternatively, if you modify slightly the text that's being printed so that it doesn't end on the accented character:

# coding: utf8
import sys

# print(sys.version)
print("Café au lait")

Now it works as you might expect:

Café au lait
[Finished in 0.1s]

I believe this to be a bug in the exec command that ships with Sublime Text in the Default package. In particular, it decodes data just prior to it being inserted into the build results, and so is potentially sensitive to where the buffer cutoffs happen when the data is being read.

Conversely, Sublime Text 3 has a modified version of the exec command which (among other enhancements) uses an incremental decoder at the point where the data is read from the pipe, and doesn't exhibit this issue.

Modifying the exec command in Sublime 2 to also use incremental decoding appears to fix the problem, although I will admit that I didn't do any exhaustive testing of this.

I have created a public gist that contains a modified version of the exec.py file that provides the exec command used by the build system, along with instructions on how to apply it.

If you use that, your existing build system (and even the default) should work find for you, barring what I mentioned above that you may still need to use the env setting in the build to force the Python interpreter to actually output UTF-8 in case it's not.



回答2:

A possible quick fix :

# coding: utf8
import json
d = json.loads("""{"mykey": {"readme": "Café"}}""", encoding='latin1')
print d['mykey']['readme'].encode('latin1')