How to display/convert a string of utf-8 to the pr

I have a list that has WhatsApp emoticons encoded as utf-8 characters. The table I am using to decode the emoticons is at http://apps.timwhitlock.info/emoji/tables/unicode

With this table I am trying to count the number of emoticons used, which I have successfully done using regex techniques. The problem is I have created a dictionary where the keys are the utf-8 characters as strings and the key_values are integers. The following:

print d_emo
for k, v in d_emo.items():
    print k.encode('utf8'), v

produces this output:

{'\\xF0\\x9F\\x98\\xA2': 2, '\\xF0\\x9F\\x98\\x82': 1, '\\xF0\\x9F\\x98\\x86': 2, '\\xF0\\x9F\\x98\\x89': 1, '\\xF0\\x9F\\x8D\\xB5': 2, '\\xF0\\x9F\\x8D\\xB0': 4, '\\xF0\\x9F\\x8D\\xAB': 2, '\\xF0\\x9F\\x8D\\xA9': 2, '\\xF0\\x9F\\x98\\x98': 1, '\\xE2\\x98\\xBA': 33, '\\xE2\\x98\\x95': 1}
\xF0\x9F\x98\xA2 2
\xF0\x9F\x98\x82 1
\xF0\x9F\x98\x86 2
\xF0\x9F\x98\x89 1
\xF0\x9F\x8D\xB5 2
\xF0\x9F\x8D\xB0 4
\xF0\x9F\x8D\xAB 2
\xF0\x9F\x8D\xA9 2
\xF0\x9F\x98\x98 1
\xE2\x98\xBA 33
\xE2\x98\x95 1

If I use this code:

for k, v in d_emo.items():
    print k.encode('utf-8').decode('unicode_escape'), v

I get

ð¢ 2
ð 1
ð 2
ð 1
ðµ 2
ð° 4
ð« 2
ð© 2
ð 1
âº 33
â 1

I should be getting smiley faces and the like. Any suggestions? This is in Python 2.7.

标签： python unicode encoding utf-8

1条回答

唯我独甜

2楼-- · 2019-06-01 08:37

This will decode the Unicode characters correctly, but in Python 2.X you are somewhat limited when using characters outside the BMP (Basic Multilingual Plane, characters U+0000 to U+FFFF):

import unicodedata as ud
D = {'\\xF0\\x9F\\x98\\xA2': 2, '\\xF0\\x9F\\x98\\x82': 1, '\\xF0\\x9F\\x98\\x86': 2, '\\xF0\\x9F\\x98\\x89': 1, '\\xF0\\x9F\\x8D\\xB5': 2, '\\xF0\\x9F\\x8D\\xB0': 4, '\\xF0\\x9F\\x8D\\xAB': 2, '\\xF0\\x9F\\x8D\\xA9': 2, '\\xF0\\x9F\\x98\\x98': 1, '\\xE2\\x98\\xBA': 33, '\\xE2\\x98\\x95': 1}
for k,v in D.iteritems():
    k = k.decode('unicode-escape').encode('latin1').decode('utf8')
    try:
        n = ud.name(k)
    except ValueError:
        n = 'no such name'
    print k,repr(k),n

Output:

☺ u'\u263a' WHITE SMILING FACE


     
                      登录 后发表回答



   
   
   
  
   相关问题
      
    
    
   
   

     


   
   how to define constructor for Python's new Nam   

   



     


   
   streaming md5sum of contents of a large remote tar   

   



     


   
   How to get the background from multiple images by   

   



     


   
   Evil ctypes hack in python   

   



     


   
   Correctly parse PDF paragraphs with Python   

   



        
      
    查看全部
   
   
  
   相关文章
 
   
   

     


   
   问个python基础问题，为什么时间不更新 及 name 'ss' is not   

     


   
   c#调用python3程序   

     


   
   如何安全的关闭程序   

     


   
   反爬能检测到JS模拟的键盘输入吗   

     


   
   有没有方法即使程序最小化也能对其发送按键   

     


   
   tkinter这样怎么不能分别赋值？   

     


   
   mouseMoveEvent奇怪的崩溃   

     


   
   在liunx 安装Levenshtein错误   

        
        
    查看全部
                 收藏的人(5)

How to display/convert a string of utf-8 to the pr

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间