Objective-C HTML parsing. Get all text between tag

2019-07-24 16:27发布

问题:

I am using hpple to try and grab a torrent description from ThePirateBay. Currently, I'm using this code:

NSString *path = @"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/node()";
NSArray *nodes = [parser searchWithXPathQuery:path];
for (TFHppleElement * element in nodes) {
    NSString *postid = [element content];
    if (postid) {
        [texts appendString:postid];
    }
}

This returns just the plain text, and not any of the URL's for screenshots. Is there anyway to get all links and other tags, not just plain text? The piratebay is fomratted like so:

<pre>
    <a href="http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg" rel="nofollow">
    http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
More texts about the file
</pre>

回答1:

That's an easy job and you did it almost correctly!

What you want is the content (or an attribute) of the a-tag, so you need to tell the parser that you want it.

Just change your XPath to

@"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/a"

(You missed the a at the very end and you do not need node())

Output:

http://www.imdb.com/title/tt1904996/
http://leetleech.org/images/65823608764828593230.png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png

If you only want the screenshot URLs you can do something like

NSMutableArray *screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for (int i = 1; i < nodes.count; i++) {
    [screenshotURLs addObject:nodes[i]];
}