There are a couple of different ways to remove HTML tags
from an NSString
in Cocoa
.
One way is to render the string into an NSAttributedString
and then grab the rendered text.
Another way is to use NSXMLDocument's
-objectByApplyingXSLTString
method to apply an XSLT
transform that does it.
Unfortunately, the iPhone doesn't support NSAttributedString
or NSXMLDocument
. There are too many edge cases and malformed HTML
documents for me to feel comfortable using regex or NSScanner
. Does anyone have a solution to this?
One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.
For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
following is the accepted answer, but instead of category, it is simple helper method with string passed into it. (thank you m.kocikowski)
Here's a more efficient solution than the accepted answer:
The above
NSString
category uses a regular expression to find all the matching tags, makes a copy of the original string and finally removes all the tags in place by iterating over them in reverse order. It's more efficient because:This performed well enough for me but a solution using
NSScanner
might be more efficient.Like the accepted answer, this solution doesn't address all the border cases requested by @lfalin. Those would be require much more expensive parsing which the average use case most likely doesn't need.
Without a loop (at least on our side) :
An updated answer for @m.kocikowski that works on recent iOS versions.
}