I try to hash some unicode strings:
hashlib.sha1(s).hexdigest()
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-81:
ordinal not in range(128)
where s
is something like:
œ∑¡™£¢∞§¶•ªº–≠œ∑´®†¥¨ˆøπ“‘åß∂ƒ©˙∆˚¬…æΩ≈ç√∫˜µ≤≥÷åйцукенгшщзхъфывапролджэячсмитьбююю..юбьтијџўќ†њѓѕ'‘“«««\dzћ÷…•∆љl«єђxcvіƒm≤≥ї!@#$©^&*(()––––––––––∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆∆•…÷ћzdzћ÷…•∆љlљ∆•…÷ћzћ÷…•∆љ∆•…љ∆•…љ∆•…∆љ•…∆љ•…љ∆•…∆•…∆•…∆•∆…•÷∆•…÷∆•…÷∆•…÷∆•…÷∆•…÷∆•…÷∆•…
what should I fix?
You hash bytes, not strings. So you gotta know what bytes you really want to hash, if an utf8 memory representation of the string, a utf16 memory representation of the string, etc.
Use encoding format
utf-8
, Try this easy way,Apparently
hashlib.sha1
isn't expecting aunicode
object, but rather a sequence of bytes in astr
object. Encoding yourunicode
string to a sequence of bytes (using, say, the UTF-8 encoding) should fix it:The error is because it is trying to convert the
unicode
object to astr
automatically, using the defaultascii
encoding, which can't handle all those non-ASCII characters (since your string isn't pure ASCII).A good starting point for learning more about Unicode and encodings is the Python docs, and this article by Joel Spolsky.