How would you get an array of Unicode code points

2019-01-17 20:57发布

站内文章 / C#

46 0

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a list of character range restrictions that I need to check a string against, but the char type in .NET is UTF-16 and therefore some characters become wacky (surrogate) pairs instead. Thus when enumerating all the char's in a string, I don't get the 32-bit Unicode code points and some comparisons with high values fail.

I understand Unicode well enough that I could parse the bytes myself if necessary, but I'm looking for a C#/.NET Framework BCL solution. So ...

How would you convert a string to an array (int[]) of 32-bit Unicode code points?

回答1:

This answer is not correct. See @Virtlink's answer for the correct one.

static int[] ExtractScalars(string s)
{
  if (!s.IsNormalized())
  {
    s = s.Normalize();
  }

  List<int> chars = new List<int>((s.Length * 3) / 2);

  var ee = StringInfo.GetTextElementEnumerator(s);

  while (ee.MoveNext())
  {
    string e = ee.GetTextElement();
    chars.Add(char.ConvertToUtf32(e, 0));
  }

  return chars.ToArray();
}

Notes: Normalization is required to deal with composite characters.

回答2:

You are asking about code points. In UTF-16 (C#'s char) there are only two possibilities:

The character is from the Basic Multilingual Plane, and is encoded by a single code unit.
The character is outside the BMP, and encoded using a surrogare high-low pair of code units

Therefore, assuming the string is valid, this returns an array of code points for a given string:

public static int[] ToCodePoints(string str)
{
    if (str == null)
        throw new ArgumentNullException("str");

    var codePoints = new List<int>(str.Length);
    for (int i = 0; i < str.Length; i++)
    {
        codePoints.Add(Char.ConvertToUtf32(str, i));
        if (Char.IsHighSurrogate(str[i]))
            i += 1;
    }

    return codePoints.ToArray();
}

An example with a surrogate pair


            
                
                   
                        
                        标签：
                            
                           
                            
                                                                   c#
                           
               
                  string
                           
               
                  unicode
                           
               
                  char
                           
               
                  astral-plane
                           
               
                                        
                    
                    
                                                              
                     
                        
                                            
                      
                        举报


        
           
    





        
            
                
            
        
        

        
            做个烂人
            
                
            
     
        
        
                女 | 书童
            
            
                
      
                
                        
                                   
                                                
                                                        
                                                   私信
                     
                                  
    

    







    
    
        
            收藏的人(0)
           
        
        
            
            






                           
        
    



    
    





   



           
           
                    
                        Ta的文章
                  
                        更多文章
                    
                    




    
        
           
        
                             
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        Redis持久化
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        如何在 VS Code 中调试 LeetCode 代码
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        仅使用CSS就可以提高页面渲染速度的4个技巧
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        测试用例管理的工具
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        公钥加密 私钥解密
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        分类---Logistic  Regression
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        用栈完成 一个表达式运算
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        如何使用SSH远程命令登录香橙派开发板
                        
                            
                        
                    
                
                                   
              
              
                
    
        
     


                
                
                

   
   
    
       


 


        
               登录 后发表评论

                    
        
        
        
        
        0条评论


           
           
            
                        

               
           还没有人评论过~
          

                          
             
 
 
 
 
 
 
  
             
             
    






   
   


           
            




  
    
      
      举报内容
    
    






检举类型


检举内容


检举用户




检举原因



广告推广


恶意灌水


回答内容与提问无关



抄袭答案


其他





检举说明(必填)






    

                
                 
      



    

  


 打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮
 

 



                            
   




  
  
    
    



      
      

    
        
            
                
                    标签大全
                    站内问题
                    专栏文章
                    站内专家
                    站内话题
                    站内公告
                     财富值规则
                
               
            
                         
                宁ICP备15000671号-9
                
                站内文章地图xml
                
                站内问答地图xml
                
                站内作者地图xml
               
                站内标签地图xml
            
            
            
                        本站部分内容来自互联网，其发布内容言论不代表本站观点，如果其链接、内容的侵犯您的权益，烦请联系我们，我们将及时予以处理。
            
            
                        邮箱：z19940522666@163.com
            
            
                
                Copyright © 2016-2018 WHATSNSV3.8