Easiest way to remove unicode representations from

2019-02-21 23:36发布

站内文章 / Python

9 0

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a string in python 3 that has several unicode representations in it, for example:

t = 'R\\u00f3is\\u00edn'

and I want to convert t so that it has the proper representation when I print it, ie:

>>> print(t)
Róisín

However I just get the original string back. I've tried re.sub and some others, but I can't seem to find a way that will change these characters without having to iterate over each one. What would be the easiest way to do so?

回答1:

You want to use the built-in codec unicode_escape.

If t is already a bytes (an 8-bit string), it's as simple as this:

>>> print(t.decode('unicode_escape'))
Róisín

If t has already been decoded to Unicode, you can to encode it back to a bytes and then decode it this way. If you're sure that all of your Unicode characters have been escaped, it actually doesn't matter what codec you use to do the encode. Otherwise, you could try to get your original byte string back, but it's simpler, and probably safer, to just force any non-encoded characters to get encoded, and then they'll get decoded along with the already-encoded ones:

>>> print(t.encode('unicode_escape').decode('unicode_escape')
Róisín

In case you want to know how to do this kind of thing with regular expressions in the future, note that sub lets you pass a function instead of a pattern for the repl. And you can convert any hex string into an integer by calling int(hexstring, 16), and any integer into the corresponding Unicode character with chr (note that this is the one bit that's different in Python 2—you need unichr instead). So:

>>> re.sub(r'(\\u[0-9A-Fa-f]+)', lambda matchobj: chr(int(matchobj.group(0)[2:], 16)), t)
Róisín

Or, making it a bit more clear:

>>> def unescapematch(matchobj):
...     escapesequence = matchobj.group(0)
...     digits = escapesequence[2:]
...     ordinal = int(digits, 16)
...     char = chr(ordinal)
...     return char
>>> re.sub(r'(\\u[0-9A-Fa-f]+)', unescapematch, t)
Róisín

The unicode_escape codec actually handles \U, \x, \X, octal (\066), and special-character (\n) sequences as well as just \u, and it implements the proper rules for reading only the appropriate max number of digits (4 for \u, 8 for \U, etc., so r'\\u22222' decodes to '∢2' rather than '


        
           
    





        
            
                
            
        
        

        
            一夜七次
            
                
            
     
        
        
                女 | 书童
            
            
                
      
                
                        
                                   
                                                
                                                        
                                                   私信
                     
                                  
    

    







    
    
        
            收藏的人(0)
           
        
        
            
            






                           
        
    



    
    





   



           
           
                    
                        Ta的文章
                  
                        更多文章
                    
                    




    
        
           
        
                             
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Spring-JPA中关于Entity继承的问题（上）
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Android开发架构思考及经验总结（下）
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        ImportError: No module named MySQLdb
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Kafka案例
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Vim的简单使用
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        我的第17个代码
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        多媒体数字展厅需要注意这些问题
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        玩俄罗斯方块的感悟
                        
                            
                        
                    
                
                                   
              
              
                
    
        
     


                
                
                

   
   
    
       


 


        
               登录 后发表评论

                    
        
        
        
        
        0条评论


           
           
            
                        

               
           还没有人评论过~
          

                          
             
 
 
 
 
 
 
  
             
             
    






   
   


           
            




  
    
      
      举报内容
    
    






检举类型


检举内容


检举用户




检举原因



广告推广


恶意灌水


回答内容与提问无关



抄袭答案


其他





检举说明(必填)






    

                
                 
      



    

  


 打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮
 

 



                            
   




  
  
    
    



      
      

    
        
            
                
                    标签大全
                    站内问题
                    专栏文章
                    站内专家
                    站内话题
                    站内公告
                     财富值规则
                
               
            
                         
                宁ICP备15000671号-9
                
                站内文章地图xml
                
                站内问答地图xml
                
                站内作者地图xml
               
                站内标签地图xml
            
            
            
                        本站部分内容来自互联网，其发布内容言论不代表本站观点，如果其链接、内容的侵犯您的权益，烦请联系我们，我们将及时予以处理。
            
            
                        邮箱：z19940522666@163.com
            
            
                
                Copyright © 2016-2018 WHATSNSV3.8