I have a list that has WhatsApp emoticons encoded as utf-8 characters. The table I am using to decode the emoticons is at http://apps.timwhitlock.info/emoji/tables/unicode
With this table I am trying to count the number of emoticons used, which I have successfully done using regex techniques. The problem is I have created a dictionary where the keys are the utf-8 characters as strings and the key_values are integers. The following:
print d_emo
for k, v in d_emo.items():
print k.encode('utf8'), v
produces this output:
{'\\xF0\\x9F\\x98\\xA2': 2, '\\xF0\\x9F\\x98\\x82': 1, '\\xF0\\x9F\\x98\\x86': 2, '\\xF0\\x9F\\x98\\x89': 1, '\\xF0\\x9F\\x8D\\xB5': 2, '\\xF0\\x9F\\x8D\\xB0': 4, '\\xF0\\x9F\\x8D\\xAB': 2, '\\xF0\\x9F\\x8D\\xA9': 2, '\\xF0\\x9F\\x98\\x98': 1, '\\xE2\\x98\\xBA': 33, '\\xE2\\x98\\x95': 1}
\xF0\x9F\x98\xA2 2
\xF0\x9F\x98\x82 1
\xF0\x9F\x98\x86 2
\xF0\x9F\x98\x89 1
\xF0\x9F\x8D\xB5 2
\xF0\x9F\x8D\xB0 4
\xF0\x9F\x8D\xAB 2
\xF0\x9F\x8D\xA9 2
\xF0\x9F\x98\x98 1
\xE2\x98\xBA 33
\xE2\x98\x95 1
If I use this code:
for k, v in d_emo.items():
print k.encode('utf-8').decode('unicode_escape'), v
I get
ð¢ 2
ð 1
ð 2
ð 1
ðµ 2
ð° 4
ð« 2
ð© 2
ð 1
⺠33
â 1
I should be getting smiley faces and the like. Any suggestions? This is in Python 2.7.