Is there a list of characters that look similar to

2020-02-16 08:51发布

站内文章 / Python

53 0

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I’m having a crack at profanity filtering for a web forum written in Python.

As part of that, I’m attempting to write a function that takes a word, and returns all possible mock spellings of that word that use visually similar characters in place of specific letters (e.g. s†å©køv€rƒ|øw).

I expect I’ll have to expand this list over time to cover people’s creativity, but is there a list floating around anywhere on the internet that I could use as a starting point?

回答1:

This is probably both vastly more deep than you need, yet not wide enough to cover your use case, but the Unicode consortium have had to deal with attacks against internationalised domain names and came up with this list of homographs (characters with the same or similar rendering):

http://www.unicode.org/Public/security/latest/confusables.txt

Might make a starting point at least.

回答2:

http://en.wikipedia.org/wiki/Letterlike_Symbols

It's much much much less comprehensive but is more comprehensible.

回答3:

I created a python class to do exactly this, based on Robin's unicode link for "confusables"

https://github.com/wanderingstan/Confusables

For example, "Hello" would get expanded into the following set of regexp character classes:

[H\Ｈ\ℋ\ℌ\ℍ\


        
           
    





        
            
                
            
        
        

        
            女痞
            
                
            
     
        
        
                女 | 书童
            
            
                
      
                
                        
                                   
                                                
                                                        
                                                   私信
                     
                                  
    

    







    
    
        
            收藏的人(0)
           
        
        
            
            






                           
        
    



    
    





   



           
           
                    
                        Ta的文章
                  
                        更多文章
                    
                    




    
        
           
        
                             
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        python作品时钟如何更改时间字体的大小？
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Linux 下部署Django项目
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Ubuntu 16.04 安装显卡驱动后循环登录和无法设置分辨率的一种解决方案
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        【Zookeeper】Zookeeper集群环境搭建
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        技术债就像俄罗斯方块，玩几局来灵感！
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        我的第15个代码
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        top命令详解
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Jenkins安装与配置
                        
                            
                        
                    
                
                                   
              
              
                
    
        
     


                
                
                

   
   
    
       


 


        
               登录 后发表评论

                    
        
        
        
        
        0条评论


           
           
            
                        

               
           还没有人评论过~
          

                          
             
 
 
 
 
 
 
  
             
             
    






   
   


           
            




  
    
      
      举报内容
    
    






检举类型


检举内容


检举用户




检举原因



广告推广


恶意灌水


回答内容与提问无关



抄袭答案


其他





检举说明(必填)






    

                
                 
      



    

  


 打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮
 

 



                            
   




  
  
    
    



      
      

    
        
            
                
                    标签大全
                    站内问题
                    专栏文章
                    站内专家
                    站内话题
                    站内公告
                     财富值规则
                
               
            
                         
                宁ICP备15000671号-9
                
                站内文章地图xml
                
                站内问答地图xml
                
                站内作者地图xml
               
                站内标签地图xml
            
            
            
                        本站部分内容来自互联网，其发布内容言论不代表本站观点，如果其链接、内容的侵犯您的权益，烦请联系我们，我们将及时予以处理。
            
            
                        邮箱：z19940522666@163.com
            
            
                
                Copyright © 2016-2018 WHATSNSV3.8