Remove HTML Tags from an NSString on the iPhone-第4页回答

There are a couple of different ways to remove HTML tags from an NSString in Cocoa.

One way is to render the string into an NSAttributedString and then grab the rendered text.

Another way is to use NSXMLDocument's -objectByApplyingXSLTString method to apply an XSLT transform that does it.

Unfortunately, the iPhone doesn't support NSAttributedString or NSXMLDocument. There are too many edge cases and malformed HTML documents for me to feel comfortable using regex or NSScanner. Does anyone have a solution to this?

One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.

For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:

<IMG SRC = "foo.gif" ALT = "A > B">

<!-- <A comment> -->

<script>if (a<b && a>c)</script>

<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>

标签： iphone objective-c cocoa-touch

22条回答

谁念西风独自凉

2楼-- · 2018-12-31 09:40

following is the accepted answer, but instead of category, it is simple helper method with string passed into it. (thank you m.kocikowski)

-(NSString *) stringByStrippingHTML:(NSString*)originalString {
    NSRange r;
    NSString *s = [originalString copy];
    while ((r = [s rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
        s = [s stringByReplacingCharactersInRange:r withString:@""];
    return s;
}

0人赞添加讨论(0) 举报

零度萤火

3楼-- · 2018-12-31 09:41

Here's a more efficient solution than the accepted answer:

- (NSString*)hp_stringByRemovingTags
{
    static NSRegularExpression *regex = nil;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        regex = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
    });

    // Use reverse enumerator to delete characters without affecting indexes
    NSArray *matches =[regex matchesInString:self options:kNilOptions range:NSMakeRange(0, self.length)];
    NSEnumerator *enumerator = matches.reverseObjectEnumerator;

    NSTextCheckingResult *match = nil;
    NSMutableString *modifiedString = self.mutableCopy;
    while ((match = [enumerator nextObject]))
    {
        [modifiedString deleteCharactersInRange:match.range];
    }
    return modifiedString;
}

The above NSString category uses a regular expression to find all the matching tags, makes a copy of the original string and finally removes all the tags in place by iterating over them in reverse order. It's more efficient because:

The regular expression is initialised only once.
A single copy of the original string is used.

This performed well enough for me but a solution using NSScanner might be more efficient.

Like the accepted answer, this solution doesn't address all the border cases requested by @lfalin. Those would be require much more expensive parsing which the average use case most likely doesn't need.

0人赞添加讨论(0) 举报

公子世无双

4楼-- · 2018-12-31 09:41

Without a loop (at least on our side) :

- (NSString *)removeHTML {

    static NSRegularExpression *regexp;
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        regexp = [NSRegularExpression regularExpressionWithPattern:@"<[^>]+>" options:kNilOptions error:nil];
    });

    return [regexp stringByReplacingMatchesInString:self
                                            options:kNilOptions
                                              range:NSMakeRange(0, self.length)
                                       withTemplate:@""];
}

0人赞添加讨论(0) 举报

心情的温度

5楼-- · 2018-12-31 09:41

An updated answer for @m.kocikowski that works on recent iOS versions.

-(NSString *) stringByStrippingHTMLFromString:(NSString *)str {
NSRange range;
while ((range = [str rangeOfString:@"<[^>]+>" options:NSRegularExpressionSearch]).location != NSNotFound)
    str = [str stringByReplacingCharactersInRange:range withString:@""];
return str;

}

0人赞添加讨论(0) 举报

上一页 1 2 3 4

Remove HTML Tags from an NSString on the iPhone

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间