I am writing a little screen-scraping app that consumes some XHTML - it goes without saying that the XHTML is invalid: ampersands aren't escaped as &
.
I am using Android's XmlPullParser
and it spews out the following error upon the incorrectly encoded value:
org.xmlpull.v1.XmlPullParserException: unterminated entity ref
(position:START_TAG <a href='/Fahrinfo/bin/query.bin/dox?ld=0.1&n=3&i=9c.0323581.1266265347&rt=0&vcra'>
@55:134 in java.io.InputStreamReader@43b1ef70)
How do I get around this? I have thought about the following solutions:
- Wrapping the
InputStream
in another one that replaces the ampersands with entity refs
- Configuring the Parser so it magically accepts the incorrect markup
Which ones is likely to be more successful?
I would go with your first option, replacing the ampersands seems more of a fit solution than the other. The second option seems more of a hack to get it to work by accepting incorrect markup.
I was stuck on this for about an hour before figuring out that in my case it was the "&" that couldn't be resolved by the XML PULL PARSER, so i found the solution. So Here is a snippet of code which totally fix it.
void ParsingActivity(String r) {
try {
parserCreator = XmlPullParserFactory.newInstance();
parser = parserCreator.newPullParser();
// Here we give our file object in the form of a stream to the
// parser.
parser.setInput(new StringReader(r.replaceAll("&", "&")));
// as a SAX parser this will raise events/callback as and when it
// comes to a element.
int parserEvent = parser.getEventType();
// we go thru a loop of all elements in the xml till we have
// reached END of document.
while (parserEvent != XmlPullParser.END_DOCUMENT) {
switch (parserEvent) {
// if u have reached start of a tag
case XmlPullParser.START_TAG:
// get the name of the tag
String tag = parser.getName();
pretty much what I'm doing I'm just replacing the &
with &
since I was dealing with parsing a URL.
Hope this helps.