UTF-8 latin-1 conversion issues, python django

ok so my issue is i have the string '\222\222\223\225' which is stored as latin-1 in the db. What I get from django (by printing it) is the following string, 'ââââ¢' which I assume is the UTF conversion of it. Now I need to pass the string into a function that does this operation:

strdecryptedPassword + chr(ord(c) - 3 - intCounter - 30)

I get this error:

chr() arg not in range(256)

If I try to encode the string as latin-1 first I get this error:

'latin-1' codec can't encode characters in position 0-3: ordinal not in range(256)

I have read a bunch on how character encoding works, and there is something I am missing because I just don't get it!

标签： python django utf-8 character-encoding

3条回答

\"骚年 ilove

2楼-- · 2019-04-10 18:39

Well its because its been encrypted with some terrible scheme that just changes the ord() of the character by some request, so the string coming out of the database has been encrypted and this decrypts it. What you supplied above does not seem to work. In the database it is latin-1, django converts it to unicode, but I cannot pass it to the function as unicode, but when i try and encode it to latin-1 i see that error.

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2019-04-10 18:47

Your first error 'chr() arg not in range(256)' probably means you have underflowed the value, because chr cannot take negative numbers. I don't know what the encryption algorithm is supposed to do when the inputcounter + 33 is more than the actual character representation, you'll have to check what to do in that case.

About the second error. you must decode() and not encode() a regular string object to get a proper representation of your data. encode() takes a unicode object (those starting with u') and generates a regular string to be output or written to a file. decode() takes a string object and generate a unicode object with the corresponding code points. This is done with the unicode() call when generated from a string object, you could also call a.decode('latin-1') instead.

>>> a = '\222\222\223\225'
>>> u = unicode(a,'latin-1')
>>> u
u'\x92\x92\x93\x95'
>>> print u.encode('utf-8')
ÂÂÂÂ
>>> print u.encode('utf-16')
ÿþ
>>> print u.encode('latin-1')

>>> for c in u:
...   print chr(ord(c) - 3 - 0 -30)
...
q
q
r
t
>>> for c in u:
...   print chr(ord(c) - 3 -200 -30)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ValueError: chr() arg not in range(256)

0人赞添加讨论(0) 举报

ゆ、 Hurt°

4楼-- · 2019-04-10 18:52

As Vinko notes, Latin-1 or ISO 8859-1 doesn't have printable characters for the octal string you quote. According to my notes for 8859-1, "C1 Controls (0x80 - 0x9F) are from ISO/IEC 6429:1992. It does not define names for 80, 81, or 99". The code point names are as Vinko lists them:

\222 = 0x92 => PRIVATE USE TWO
\223 = 0x93 => SET TRANSMIT STATE
\225 = 0x95 => MESSAGE WAITING

The correct UTF-8 encoding of those is (Unicode, binary, hex):

U+0092 = %11000010 %10010010 = 0xC2 0x92
U+0093 = %11000010 %10010011 = 0xC2 0x93
U+0095 = %11000010 %10010101 = 0xC2 0x95

The LATIN SMALL LETTER A WITH CIRCUMFLEX is ISO 8859-1 code 0xE2 and hence Unicode U+00E2; in UTF-8, that is %11000011 %10100010 or 0xC3 0xA2.

The CENT SIGN is ISO 8859-1 code 0xA2 and hence Unicode U+00A2; in UTF-8, that is %11000011 %10000010 or 0xC3 0x82.

So, whatever else you are seeing, you do not seem to be seeing a UTF-8 encoding of ISO 8859-1. All else apart, you are seeing but 5 bytes where you would have to see 8.

Added: The previous part of the answer addresses the 'UTF-8 encoding' claim, but ignores the rest of the question, which says:

Now I need to pass the string into a function that does this operation:

    strdecryptedPassword + chr(ord(c) - 3 - intCounter - 30)

I get this error: chr() arg not in range(256).  If I try to encode the
string as Latin-1 first I get this error: 'latin-1' codec can't encode
characters in position 0-3: ordinal not in range(256).

You don't actually show us how intCounter is defined, but if it increments gently per character, sooner or later 'ord(c) - 3 - intCounter - 30' is going to be negative (and, by the way, why not combine the constants and use 'ord(c) - intCounter - 33'?), at which point, chr() is likely to complain. You would need to add 256 if the value is negative, or use a modulus operation to ensure you have a positive value between 0 and 255 to pass to chr(). Since we can't see how intCounter is incremented, we can't tell if it cycles from 0 to 255 or whether it increases monotonically. If the latter, then you need an expression such as:

chr(mod(ord(c) - mod(intCounter, 255) + 479, 255))

where 256 - 33 = 223, of course, and 479 = 256 + 223. This guarantees that the value passed to chr() is positive and in the range 0..255 for any input character c and any value of intCounter (and, because the mod() function never gets a negative argument, it also works regardless of how mod() behaves when its arguments are negative).

0人赞添加讨论(0) 举报

UTF-8 latin-1 conversion issues, python django

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间