-->

KXmlParser抛出的RSS考取开始“意外令牌”异常(KXmlParser throws “Un

2019-07-21 03:08发布

我试图解析从使用该URL的Android V.17怪物一个RSS feed:

http://rss.jobsearch.monster.com/rssquery.ashx?q=java

为了让我使用HttpURLConnection类在下面的时尚内容

this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(url.openStream());

什么回来是因为据我可以说(和我核实太)一个合法的RSS

Cache-Control:private
Connection:Keep-Alive
Content-Encoding:gzip
Content-Length:5958
Content-Type:text/xml
Date:Wed, 06 Mar 2013 17:15:20 GMT
P3P:CP=CAO DSP COR CURa ADMa DEVa IVAo IVDo CONo HISa TELo PSAo PSDo DELa PUBi BUS LEG PHY ONL UNI PUR COM NAV INT DEM CNT STA HEA PRE GOV OTC
Server:Microsoft-IIS/7.5
Vary:Accept-Encoding
X-AspNet-Version:2.0.50727
X-Powered-By:ASP.NET

它开始像这样(点击,如果你想看到完整的XML上述网址):

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
  <channel>
    <title>Monster Job Search Results java</title>
    <description>RSS Feed for Monster Job Search</description>
    <link>http://rss.jobsearch.monster.com/rssquery.ashx?q=java</link>

但是,当我尝试分析它:

final XmlPullParser xpp = getPullParser();
xpp.setInput(is);
for (int type = xpp.getEventType(); type != XmlPullParser.END_DOCUMENT; type = xpp.next()) { /* pasing goes here */ }

该代码立即扼流圈type = xpp.next()具有以下例外

03-06 09:27:27.796: E/AbsXmlResultParser(13363): org.xmlpull.v1.XmlPullParserException: 
   Unexpected token (position:TEXT @1:2 in java.io.InputStreamReader@414b4538) 

这实际上意味着它不能在线处理第二炭1 <?xml version="1.0" encoding="utf-8"?>

下面是在KXmlParser.java(425-426)有问题的线路。 类型== TEXT计算结果为true

if (depth == 0 && (type == ENTITY_REF || type == TEXT || type == CDSECT)) {
    throw new XmlPullParserException("Unexpected token", this, null);
}

任何帮助吗? 我曾尝试解析器设置为XmlPullParser.FEATURE_PROCESS_DOCDECL = false ,但没有帮助

我没有研究这个在网络上,在这里并不能找到任何有助于

Answer 1:

你得到错误的原因是,XML文件实际上不启动<?xml version="1.0" encoding="utf-8"?> 。 它从三个特殊字节EF BB BF这是Byte order mark

InputStreamReader不会自动处理这些字节,所以你必须手动处理它们。 它最简单的方法是使用BOMInpustStream提供Commons IO库:

this.conn = (HttpURLConnection) url.openConnection();
this.conn.setConnectTimeout(5000);
this.conn.setReadTimeout(10000);
this.conn.setUseCaches(true);
conn.addRequestProperty("Content-Type", "text/xml; charset=utf-8");
is = new InputStreamReader(new BOMInputStream(conn.getInputStream(), false, ByteOrderMark.UTF_8));  

我检查上面的代码,它很适合我。



文章来源: KXmlParser throws “Unexpected token” exception at the start of RSS pasing