Unable to get image url from rss feed using jsoup

2019-08-03 06:50发布

问题:

I'm trying to fetch image url from XML. I'm using jsoup to parse.

I am using the following code:

doc = Jsoup.connect(mainUrl).timeout(1000 * 1000).get();

                final String title = doc.select("title").first().text();
                final String description = doc.select("description").first()
                        .text().toString();
                final String link = doc.select("link").first().nextSibling()
                        .toString();

                for (Element image : doc.select("image")) {
                    Log.d(TAG, "inside for loop.................");
                    final String titleImage = image.select("title").first()
                            .text();
                    final String linkImage = image.select("link").first()
                            .text().toString();
                    final String urlImage = image.select("url").first().text()
                            .toString();

                    Log.d(TAG, "titleIm is : " + titleImage + " linkIm: "
                            + linkImage + " urlIM: " + urlImage);
                }
                Log.d(TAG, "title is : " + title + " desc: " + description
                        + " link: " + link + " url: " + url);

Control didn't come into for loop. But, <image> tag is in XML(url) only.

What mistake I did?

I couldn't find out.

Thank you in advance!!!

EDIT

This is my html tags:

<!--?xml version="1.0" encoding="UTF-8"?-->
<!--?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"?-->
<!--?xml-stylesheet type="text/css" media="screen" href="http://feeds.hindustantimes.com/~d/styles/itemcontent.css"?-->
<html>
 <head></head>
 <body>
  <rss xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
   <channel>
    <title>News from India</title>
    <link />http://www.hindustantimes.com
    <description>
     Latest news updates from HindustanTimes.com for IndiaSectionPage-Topstories. For more updates hop on to HindustanTimes.com.
    </description>
    <language>
     en
    </language>
    <copyright>
     Copyright (C) 2013 HT Media Limited. All Rights Reserved.
    </copyright>
    <pubdate>
     Wed, 26 Jun 2013 09:24:57 GMT
    </pubdate>
    <lastbuilddate>
     Wed, 26 Jun 2013 09:24:57 GMT
    </lastbuilddate>
    <ttl>
     2
    </ttl>
    <img />
    <title>HindustanTimes.com - Top IndiaSectionPage-Topstories News Headlines</title>
    <url>
     http://www.hindustantimes.com/images/logo.gif
    </url>
    <link />http://www.hindustantimes.com
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" type="application/rss+xml" href="http://feeds.hindustantimes.com/HT-IndiaSectionPage-Topstories" />
    <feedburner:info uri="ht-indiasectionpage-topstories" />
    <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://pubsubhubbub.appspot.com/" />
    <media:copyright>
     Copyright (C) 2013 HT Media Limited. All Rights Reserved.
    </media:copyright>
    <itunes:explicit>
     no
    </itunes:explicit>
    <itunes:subtitle>
     Latest news updates from HindustanTimes.com for IndiaSectionPage-Topstories. For more updates hop on to HindustanTimes.com.
    </itunes:subtitle>
    <item>
     <title>Senior Congress leader Subhash Yadav passes away</title>
     <link />http://feeds.hindustantimes.com/~r/HT-IndiaSectionPage-Topstories/~3/HSWJIomgkPQ/story01.htm
     <description>
      Senior Congress leader and former Madhya Pradesh deputy chief minister Subhash Yadav died this morning at a hospital in New Delhi following prolonged illness, family sources said.&lt;img width=&apos;1&apos; height=&apos;1&apos; src=&apos;http://hindustantimes.com.feedsportal.com/c/33818/f/608451/s/2dcf34f1/mf.gif&apos; border=&apos;0&apos;/&gt;&lt;div class=&apos;mf-viral&apos;&gt;&lt;table border=&apos;0&apos;&gt;&lt;tr&gt;&lt;td valign=&apos;middle&apos;&gt;&lt;a href=&quot;http://share.feedsportal.com/share/twitter/?u=http%3A%2F%2Fwww.hindustantimes.com%2FIndia-news%2FMadhyaPradesh%2FSenior-Congress-leader-Subhash-Yadav-passes-away%2FArticle1-1082724.aspx&amp;t=Senior+Congress+leader+Subhash+Yadav+passes+away&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;http://res3.feedsportal.com/social/twitter.png&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&amp;nbsp;&lt;a href=&quot;http://share.feedsportal.com/share/facebook/?u=http%3A%2F%2Fwww.hindustantimes.com%2FIndia-news%2FMadhyaPradesh%2FSenior-Congress-leader-Subhash-Yadav-passes-away%2FArticle1-1082724.aspx&amp;t=Senior+Congress+leader+Subhash+Yadav+passes+away&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;http://res3.feedsportal.com/social/facebook.png&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&amp;nbsp;&lt;a href=&quot;http://share.feedsportal.com/share/linkedin/?u=http%3A%2F%2Fwww.hindustantimes.com%2FIndia-news%2FMadhyaPradesh%2FSenior-Congress-leader-Subhash-Yadav-passes-away%2FArticle1-1082724.aspx&amp;t=Senior+Congress+leader+Subhash+Yadav+passes+away&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;http://res3.feedsportal.com/social/linkedin.png&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&amp;nbsp;&lt;a href=&quot;http://share.feedsportal.com/share/gplus/?u=http%3A%2F%2Fwww.hindustantimes.com%2FIndia-news%2FMadhyaPradesh%2FSenior-Congress-leader-Subhash-Yadav-passes-away%2FArticle1-1082724.aspx&amp;t=Senior+Congress+leader+Subhash+Yadav+passes+away&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;http://res3.feedsportal.com/social/googleplus.png&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&amp;nbsp;&lt;a href=&quot;http://share.feedsportal.com/share/email/?u=http%3A%2F%2Fwww.hindustantimes.com%2FIndia-news%2FMadhyaPradesh%2FSenior-Congress-leader-Subhash-Yadav-passes-away%2FArticle1-1082724.aspx&amp;t=Senior+Congress+leader+Subhash+Yadav+passes+away&quot; target=&quot;_blank&quot;&gt;&lt;img src=&quot;http://res3.feedsportal.com/social/email.png&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;&lt;/td&gt;&lt;td valign=&apos;middle&apos;&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;&lt;/div&gt;&lt;br/&gt;&lt;br/&gt;&lt;a href=&quot;http://da.feedsportal.com/r/165665396933/u/49/f/608451/c/33818/s/2dcf34f1/a2.htm&quot;&gt;&lt;img src=&quot;http://da.feedsportal.com/r/165665396933/u/49/f/608451/c/33818/s/2dcf34f1/a2.img&quot; border=&quot;0&quot;/&gt;&lt;/a&gt;&lt;img width=&quot;1&quot; height=&quot;1&quot; src=&quot;http://pi.feedsportal.com/r/165665396933/u/49/f/608451/c/33818/s/2dcf34f1/a2t.img&quot; border=&quot;0&quot;/&gt;&lt;div class=&quot;feedflare&quot;&gt; &lt;a href=&quot;http://feeds.hindustantimes.com/~ff/HT-IndiaSectionPage-Topstories?a=HSWJIomgkPQ:FLtHY4O2U4k:yIl2AUoC8zA&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~ff/HT-IndiaSectionPage-Topstories?d=yIl2AUoC8zA&quot; border=&quot;0&quot;&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.hindustantimes.com/~ff/HT-IndiaSectionPage-Topstories?a=HSWJIomgkPQ:FLtHY4O2U4k:-BTjWOF_DHI&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~ff/HT-IndiaSectionPage-Topstories?i=HSWJIomgkPQ:FLtHY4O2U4k:-BTjWOF_DHI&quot; border=&quot;0&quot;&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.hindustantimes.com/~ff/HT-IndiaSectionPage-Topstories?a=HSWJIomgkPQ:FLtHY4O2U4k:F7zBnMyn0Lo&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~ff/HT-IndiaSectionPage-Topstories?i=HSWJIomgkPQ:FLtHY4O2U4k:F7zBnMyn0Lo&quot; border=&quot;0&quot;&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.hindustantimes.com/~ff/HT-IndiaSectionPage-Topstories?a=HSWJIomgkPQ:FLtHY4O2U4k:7Q72WNTAKBA&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~ff/HT-IndiaSectionPage-Topstories?d=7Q72WNTAKBA&quot; border=&quot;0&quot;&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.hindustantimes.com/~ff/HT-IndiaSectionPage-Topstories?a=HSWJIomgkPQ:FLtHY4O2U4k:V_sGLiPBpWU&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~ff/HT-IndiaSectionPage-Topstories?i=HSWJIomgkPQ:FLtHY4O2U4k:V_sGLiPBpWU&quot; border=&quot;0&quot;&gt;&lt;/img&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.hindustantimes.com/~ff/HT-IndiaSectionPage-Topstories?a=HSWJIomgkPQ:FLtHY4O2U4k:qj6IDK7rITs&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~ff/HT-IndiaSectionPage-Topstories?d=qj6IDK7rITs&quot; border=&quot;0&quot;&gt;&lt;/img&gt;&lt;/a&gt; &lt;/div&gt;&lt;img src=&quot;http://feeds.feedburner.com/~r/HT-IndiaSectionPage-Topstories/~4/HSWJIomgkPQ&quot; height=&quot;1&quot; width=&quot;1&quot;/&gt;
     </description>
     <pubdate>
      Wed, 26 Jun 2013 08:41:18 GMT
     </pubdate>
     <comments>
      http://www.hindustantimes.com/India-news/MadhyaPradesh/Senior-Congress-leader-Subhash-Yadav-passes-away/Article1-1082724.aspx
     </comments>
     <guid ispermalink="false">
      1082724
     </guid>
     <feedburner:origlink>
      http://hindustantimes.com.feedsportal.com/c/33818/f/608451/s/2dcf34f1/l/0L0Shindustantimes0N0CIndia0Enews0CMadhyaPradesh0CSenior0ECongress0Eleader0ESubhash0EYadav0Epasses0Eaway0CArticle10E10A827240Baspx/story01.htm
     </feedburner:origlink>
    </item>
<media:rating>
     nonadult
    </media:rating>
   </channel>
  </rss> 
 </body>
</html>

回答1:

If you are looking to fetch the url in enclosure tag, then this should do it

Elements items = doc.select("item");
  for (Element item : items) {
        String imageUrl = item.select("enclosure").first().attr("url");
     }


回答2:

You can use a XMLPullParser to get the data from your rss feed.

http://developer.android.com/training/basics/network-ops/xml.html

    URL url = new URL("http://feeds.hindustantimes.com/HT-IndiaSectionPage-Topstories "); 
    XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
            factory.setNamespaceAware(false);
            XmlPullParser xpp = factory.newPullParser();
            xpp.setInput(url.openConnection().getInputStream(), "UTF_8"); 
            //xpp.setInput(getInputStream(url), "UTF-8");

            boolean insideItem = false;

                // Returns the type of current event: START_TAG, END_TAG, etc..
            int eventType = xpp.getEventType();
            while (eventType != XmlPullParser.END_DOCUMENT) {
                if (eventType == XmlPullParser.START_TAG) {

                    if (xpp.getName().equalsIgnoreCase("item")) {
                        insideItem = true;
                    } else if (xpp.getName().equalsIgnoreCase("title")) {
                        if (insideItem)
                               Log.i("hi",(xpp.nextText())); 
                    } else if (xpp.getName().equalsIgnoreCase("link")) {
                        if (insideItem)
                            Log.i("hi",(xpp.nextText())); 
                    }
                    else if (xpp.getName().equalsIgnoreCase("url")) {
                        if (insideItem)
                        {

                            Log.i("hi",(xpp.nextText())); 
                        }//extract the link of article
                    }
                }else if(eventType==XmlPullParser.END_TAG && xpp.getName().equalsIgnoreCase("item")){
                    insideItem=false;
                }

                eventType = xpp.next(); //move to next element
            }

        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (XmlPullParserException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }


回答3:

Elements yourParsedText = doc.select(item);
for(Element media : yourParsedText) {
String mediaLinkThatYouGet = media.select("media|thumbnail")
              .attr("url");
}

the base code is like that.