I want to convert unicode string into iso-8859-15. These strings include the u"\u2019"
(RIGHT SINGLE QUOTATION MARK see http://www.fileformat.info/info/unicode/char/2019/index.htm) character which is not part of the iso-8859-15 characters set.
In Python, how to normalize the unicode characters in order to match the iso-8859-15 encoding?
I have looked at the unicodedata module without success. I manage to do the job with
s.replace(u"\u2019", "'").encode('iso-8859-15')
but I would like to find a more general and cleaner way.
Thanks for your help
For info, my final solution:
Thank you for your help
Unless you wish to create a translation rule (if you do, look at Boud's answer), you could choose one of the default error handlers
encode
provides or even register your own one:From
encode
docstring:Use the unicode version of the
translate
function, assumings
is a unicode string:The argument of the unicode version of
translate
is a dict mapping unicode ordinals to unicode ordinals. Add to this dict other characters you cannot encode in your target encoding.You can build your mapping table in a little more readable form and create your mapping dict from it, for instance:
From translate documentation: