I have a problem involving encoding/decoding. I read text from file and compare it with text from database (Postgres) Compare is done within two lists
from file i get "jo\x9a" for "još" and from database I get "jo\xc5\xa1" for same value
common = [a for a in codes_from_file if a in kode_prfoksov]
# Items in one but not the other
only1 = [a for a in codes_from_file if not a in kode_prfoksov]
#Items only in another
only2 = [a for a in kode_prfoksov if not a in codes_from_file ]
How to solve this? Which encoding should be set when comparing this two strings to solve the issue?
thank you
Your file strings seems to be Windows-1250 encoded. Your database seems to contain UTF-8 strings.
So you can either convert first all strings to unicode:
or if you do not want unicode strings, just convert the file string to UTF-8:
The first one seems to be
windows-1250
, and the second isutf-8
.