Can php detect 4-byte encoded utf8 chars?

I am using a utf8 charset mysql tables in a mysql 5.1 server, which does not support utf8mb4 encoding in tables. When inserting 4-byte encoded utf8 characters like "


   
    



        
        
        
        
        2条回答

           
       
           
           
           
                                              
            
                                  
            
            
            
            
            
            \"骚年 ilove                          
            
             
             2楼-- · 2019-01-31 14:08
             
             
             
                          
             
                                                                          
The following regular expression will replace 4-byte UTF-8 characters:

function replace4byte($string, $replacement = '') {
    return preg_replace('%(?:
          \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
        | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
        | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
    )%xs', $replacement, $string);    
}

var_dump(replace4byte('d'), replace4byte('d                                                                    
                                                        
            

              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                

                  
                



                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                

             
            
                                  
            
            
            
            
            
            老娘就宠你                          
            
             
             3楼-- · 2019-01-31 14:12
             
             
             
                          
             
                                                                          
This should work:

if (max(array_map('ord', str_split($string))) >= 240) 


The rational being that code points up to and including U+FFFF are encoded as three bytes of the form 1110xxxx 10xxxxxx 10xxxxxx. Higher code points are of the form 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx, i.e. the highest byte has a value of 240 or higher. If there are any such bytes in the string, it's an indicator for a 4-byte sequence.

If you want to remove long characters, this will do:

preg_replace_callback('/./u', function (array $match) {
    return strlen($match[0]) >= 4 ? null : $match[0];
}, $string)


Though there may be a more elegant regex way to express high codepoints directly.
    
                                                                    
                                                        
            
              
                查看更多
                
             
              0人赞

                                                     添加讨论(0)

                                                                                                            
                               举报
                
                
                
                  
                


                        
                            

                               
             
                        
               
            

                            
                            
                                 加载中...
                            
                        

                
   
   
               

               

     
                      登录 后发表回答

                                           
        
               

               

    




   


   
   
   
  
   相关问题
      
    
    
   
   

     


   
   Views base64 encoded blob in HTML with PHP   

   



     


   
   Laravel Option Select - Default Issue   

   



     


   
   PHP Recursively File Folder Scan Sorted by Modific   

   



     


   
   Can php detect if javascript is on or not?   

   



     


   
   Using similar_text and strpos together   

   



        
      
    查看全部
   
   
  
   相关文章
 
   
   

     


   
   appnode 网站已经建立好了,admin.php等无法访问   

     


   
   如何安全的关闭程序   

     


   
   tp5.1.前后端分离.cros跨域问题.在线上找了各种方法.没辙了   

     


   
   这个php乱码能不能恢复   

     


   
   你们和公司如何签署不泄露公司项目的协议呢   

     


   
   如何将表单内容通过php展示出来 求解！   

     


   
   财务系统域名选择问题   

     


   
   ssl配置问题   

        
        
    查看全部
                 收藏的人(6)

Can php detect 4-byte encoded utf8 chars?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间