I was running a python tool and trying to save its output to a file. If I don't save the output to a file, the tool runs perfectly fine. But when I try to save the output to the file, it throws following error and interrupt the program:
File "./androdiff.py", line 118, in <module>
main(options, arguments)
File "./androdiff.py", line 94, in main
ddm.show()
File "./elsim/elsim/elsim_dalvik.py", line 772, in show
self.eld.show()
File "./elsim/elsim/elsim.py", line 435, in show
i.show()
File "./elsim/elsim/elsim_dalvik.py", line 688, in show
print hex(self.bb.bb.start + self.offset), self.pos_instruction, self.ins.get_name(), self.ins.show_buff( self.bb.bb.start + self.offset )
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0111' in position 35: ordinal not in range(128)
I've tried command |less
, command > output
and command | tee output
, all of them will throw such error.
Please help to resolve the issue.
Thanks!
Set
PYTHONIOENCODING
environment variable explicitly if stdout character encoding can't be determined automatically e.g., when the output is redirected to a file:Don't hardcode the character encoding in your scripts if the output may go to a terminal; print Unicode strings instead and let users to configure their environment.
You will want to specify the encoding of your string before you print it:
The reason this works is because python automatically encodes your string correctly (in your case utf-8) when printing to the terminal (it detects that the terminal uses utf-8).
When you are redirecting your output to a file instead, Python has no information about what encoding it should use and it defaults to ascii instead (which is causing your error).
As a general rule of thumb, make sure you always encode your string before printing to make
print
work in all environments.The best method may be to define your own print method for this:
If you want to avoid the above and make printing with utf-8 encoding the default you can do
Beware of this approach! Some third-party libraries may depend on the default encoding being ascii and break. Note that this whole mess has been resolved in Python 3 (which defaults to UTF-8 encoding)