How do I find non-length-specified substrings in a

2019-06-14 02:53发布

问题:

I'm trying, for the first time in my life, to contribute to open source software. Therefore I'm trying to help out on this ticket, as it seems to be a good "beginner ticket".

I have successfully got the string from the Twitter API: however, it's in this format:

<a href="http://twitter.com" rel="nofollow">Tweetie for Mac</a>

What I want to extract from this string is the URL (http://twitter.com) and the name of the Twitter client (Tweetie for Mac). How can I do this in Objective-C? As the URL's aren't the same I can't search for a specified index, and the same applies for the client name.

回答1:

Assuming you have the HTML link already and aren't parsing an entire HTML page.

//Your HTML Link
NSString *link = [urlstring text];

//Length of HTML href Link
int length = [link length];

//Range of the first quote
NSRange firstQuote = [link rangeOfString:@"\""];

//Subrange to search for another quote in the HTML href link
NSRange nextQuote = NSMakeRange(firstQuote.location+1, length-firstQuote.location-1);

//Range of the second quote after the first
NSRange secondQuote = [link rangeOfString:@"\"" options:NSCaseInsensitiveSearch range:nextQuote];

//Extracts the http://twitter.com
NSRange urlRange = NSMakeRange(firstQuote.location+1, (secondQuote.location-1) - (firstQuote.location));
NSString *url = [link substringWithRange:urlRange];

//Gets the > right before Tweetie for Mac
NSRange firstCaret = [link rangeOfString:@">"];

//This appears at the start of the href link, we want the next one
NSRange firstClosedCaret = [link rangeOfString:@"<"];
NSRange nextClosedCaret = NSMakeRange(firstClosedCaret.location+1, length-firstClosedCaret.location-1);

//Gets the < right after Tweetie for Mac
NSRange secondClosedCaret = [link rangeOfString:@"<" options:NSCaseInsensitiveSearch range:nextClosedCaret];

//Range of the twitter client
NSRange rangeOfTwitterClient = NSMakeRange(firstCaret.location+1, (secondClosedCaret.location-1)-(firstCaret.location));
NSString *twitterClient = [link substringWithRange:rangeOfTwitterClient];


回答2:

you know that this portion of the string will be the same:

<a href="...">...</a>

so what you really want is a search to the first " and the closing > for the beginning of the a tag.

The easiest way to do this would be to find what is in the quotes (see this link for how to search NSStrings) and then get the text after the second to last > for your actual name.

You could also use an NSXMLParser as that works on XML specifically, but that may be overkill for this case.



回答3:

I haven't looked at Adium source but you should check if there are any categories available that extend e.g. NSString with methods for parsing html/xml to more usable structures, like a node tree for example. Then you could simply walk the tree and search for the required attributes.

If not, you may either parse it yourself by dividing the string into tokens (tag open, tag close, tag attributes, quoted strings and so on), then look for the required attributes. Alternatively you could even use a regular expression if the strings always consist of a single html anchor element.

I know it's been discussed many times that regular expressions simply don't work for html parsing, but this is a specific scenario where it's actually reasonable. Better than running a full-blown, generic html/xml parser. That would be, as slycrel said, an overkill.