I am trying to work with rss and parse it. I found the Rome and I am trying to work with it by code:
private SyndFeed parseFeed(String url) throws IllegalArgumentException, FeedException, IOException {
return new SyndFeedInput().build(new XmlReader(new URL(url)));
}
public Boolean processRSSContent(String url) {
try {
SyndFeed theFeed = this.parseFeed(url);
SyndEntry entry = theFeed.getEntries().get(0);
ZonedDateTime entryUtcDate = ZonedDateTime.ofInstant(entry.getPublishedDate().toInstant(), ZoneOffset.UTC);
String entryTitle = entry.getTitle();
String entryText = entry.getDescription().getValue();
}
catch (ParsingFeedException e) {
e.printStackTrace();
return false;
}
catch (FeedException e) {
e.printStackTrace();
return false;
}
catch (IOException e) {
e.printStackTrace();
return false;
}
}
On some channels like http://feeds.bbci.co.uk/news/world/rss.xml everything works fine, but on some other channels like http://habrahabr.ru/rss/ I get the error:
Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
I took a look at the content behind this link and xml is really strange. But it's a popular site and I got this error on some other sites so I don't believe that xml there is a problem. What did I do wrong? How to read this RSS-channels? Could somebody give me a helping hand, please?
If you put the url http://habrahabr.ru/rss/ to your browser, you'll notice that it redirects to https://habrahabr.ru/rss/interesting. Your code doesn't handle redirects.
I suggest you use HttpClientFeedFetcher from rome-fetcher module, it handles redirects and has other advanced features (caching, conditional GETs, compression):
EDIT: Rome-fetcher is being deprecated, but Apache HttpClient can be used instead and it is more flexible.