-->

Java XMLReader not clearing multi-byte UTF-8 encod

2019-06-20 13:34发布

站内文章 / Java

58 0

叼着烟拽天下

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've got a really strange situation where my SAX ContentHandler is being handed bad Attributes by XMLReader. The document being parsed is UTF-8 with multi-byte characters inside XML attributes. What appears to happen is that these attributes are being accumulated each time my handler is called. So rather than being passed in succession, they get concatenated onto the previous node's value.

Here is an example which demonstrates this using public data (Wikipedia).

public class MyContentHandler extends org.xml.sax.helpers.DefaultHandler {

    public static void main(String[] args) {
        try {
            org.xml.sax.XMLReader reader = org.xml.sax.helpers.XMLReaderFactory.createXMLReader();
            reader.setContentHandler(new MyContentHandler());
            reader.parse("http://en.wikipedia.org/w/api.php?format=xml&action=query&list=allpages&apfilterredir=redirects&apdir=descending");

        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

    public void startElement(String uri, String localName, String qName, org.xml.sax.Attributes attributes) {
        if ("p".equals(qName)) {
            String title = attributes.getValue("title");
            System.out.println(title);
        }
    }
}

Update: This complete example produces (apologies to any Cantonese speakers for the vulgar output):


            
                
                   
                        
                        标签：
                            
                           
                            
                                                                   java
                           
               
                  utf-8
                           
               
                  character-encoding
                           
               
                  sax
                           
               
                  xmlreader
                           
               
                                        
                    
                    
                                                              
                     
                        
                                            
                      
                        举报


        
           
    





        
            
                
            
        
        

        
            叼着烟拽天下
            
                
            
     
        
        
                女 | 书童
            
            
                
      
                
                        
                                   
                                                
                                                        
                                                   私信
                     
                                  
    

    







    
    
        
            收藏的人(0)
           
        
        
            
            






                           
        
    



    
    





   



           
           
                    
                        Ta的文章
                  
                        更多文章
                    
                    




    
        
           
        
                             
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        云计算离普通用户远吗？
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        这10道测试用例面试题，面试官肯定会问到
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        sh和bash之间的区别
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        基于 Serverless 技术的视频截帧架构实战
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        （（类）对象）.方法
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        形态学-腐蚀、膨胀、开操作、闭操作
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        分享在线制作思维导图的简单方法
                        
                            
                        
                    
                
                                       
                    
                
                    
                    
                    
                    
                        
                        
                    
                    
                        json.dumps、json.dump、json.loads、json.load的区别
                        
                            
                        
                    
                
                                   
              
              
                
    
        
     


                
                
                

   
   
    
       


 


        
               登录 后发表评论

                    
        
        
        
        
        0条评论


           
           
            
                        

               
           还没有人评论过~
          

                          
             
 
 
 
 
 
 
  
             
             
    






   
   


           
            




  
    
      
      举报内容
    
    






检举类型


检举内容


检举用户




检举原因



广告推广


恶意灌水


回答内容与提问无关



抄袭答案


其他





检举说明(必填)






    

                
                 
      



    

  


 打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮
 

 



                            
   













  
  
    
    



      
      

    
        
            
                
                    标签大全
                    站内问题
                    专栏文章
                    站内专家
                    站内话题
                    站内公告
                     财富值规则
                
               
            
                         
                宁ICP备15000671号-9
                
                站内文章地图xml
                
                站内问答地图xml
                
                站内作者地图xml
               
                站内标签地图xml
            
            
            
                        本站部分内容来自互联网，其发布内容言论不代表本站观点，如果其链接、内容的侵犯您的权益，烦请联系我们，我们将及时予以处理。
            
            
                        邮箱：manongdaohezuo@gmail.com
            
            
                
                Copyright © 2016-2018 WHATSNSV3.8