Counter for words and emoji

I have a dataframe with a column "clear_message", and I created a column that counts all the words in each row.

history['word_count'] = history.clear_message.apply(lambda x: Counter(x.split(' ')))

For example, if the rows message is: Hello my name is Hello Then the counter in his row, will be Counter({'Hello': 2, 'is': 1, 'my': 1, 'name': 1})

The problem

I have emoji in my text, and I want also a counter for the emoji.

For example:

test = '


   
    



        
        
        
        
        2条回答

           
       
           
           
           
                                              
            
                                  
            
            
            
            
            
            【Aperson】                          
            
             
             2楼-- · 2019-07-17 05:44
             
             
             
                          
             
                                                                          
I think your idea of adding a space after each emoji is a good approach. You'll also need to strip white space in case there already was a space between an emoji and the next character, but that's simple enough. Something like:

def emoji_splitter(text):
    new_string = ""
    for char in text:
        if char in emoji.UNICODE_EMOJI:
            new_string += " {} ".format(char)
        else:
            new_string += char
    return [v for v in map(lambda x: x.strip(), new_string.split(" ")) if v != ""]


Maybe you could improve this by using a sliding window to check for spaces after emojis and only add spaces where necessary, but that would assume there will only ever be one space, where as this solution should account for 0 to n spaces between emojis.
    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
             
            
                                  
            
            
            
            
            
            该账号已被封号                          
            
             
             3楼-- · 2019-07-17 05:44
             
             
             
                          
             
                                                                          
there was some problems with @con-- answer, so I fixed it.

def emoji_splitter(text):
    new_string = ""
    text = text.lstrip()
    if text:
        new_string += text[0] + " "
    for char in ' '.join(text[1:].split()):
        new_string += char
        if char in emoji.UNICODE_EMOJI:
            new_string = new_string + " " 
    return list(map(lambda x: x.strip(), new_string.split()))


example:

emoji_splitter(' a                                                                    
                                                        
            

              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                

                  
                



                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                

   
   
               

               

     
                      登录 后发表回答

                                           
        
               

               

    




   


   
   
   
  
   相关问题
      
    
    
   
   

     


   
   how to define constructor for Python's new Nam   

   



     


   
   streaming md5sum of contents of a large remote tar   

   



     


   
   How to get the background from multiple images by   

   



     


   
   Evil ctypes hack in python   

   



     


   
   Correctly parse PDF paragraphs with Python   

   



        
      
    查看全部
   
   
  
   相关文章
 
   
   

     


   
   问个python基础问题，为什么时间不更新 及 name 'ss' is not   

     


   
   c#调用python3程序   

     


   
   如何安全的关闭程序   

     


   
   反爬能检测到JS模拟的键盘输入吗   

     


   
   有没有方法即使程序最小化也能对其发送按键   

     


   
   tkinter这样怎么不能分别赋值？   

     


   
   mouseMoveEvent奇怪的崩溃   

     


   
   在liunx 安装Levenshtein错误   

        
        
    查看全部
                 收藏的人(6)

Counter for words and emoji

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间