There are a couple of different ways to remove HTML tags
from an NSString
in Cocoa
.
One way is to render the string into an NSAttributedString
and then grab the rendered text.
Another way is to use NSXMLDocument's
-objectByApplyingXSLTString
method to apply an XSLT
transform that does it.
Unfortunately, the iPhone doesn't support NSAttributedString
or NSXMLDocument
. There are too many edge cases and malformed HTML
documents for me to feel comfortable using regex or NSScanner
. Does anyone have a solution to this?
One suggestion has been to simply look for opening and closing tag characters, this method won't work except for very trivial cases.
For example these cases (from the Perl Cookbook chapter on the same subject) would break this method:
<IMG SRC = "foo.gif" ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
Here's the swift version :
Take a look at NSXMLParser. It's a SAX-style parser. You should be able to use it to detect tags or other unwanted elements in the XML document and ignore them, capturing only pure text.
I've extended the answer by m.kocikowski and tried to make it a bit more efficient by using an NSMutableString. I've also structured it for use in a static Utils class (I know a Category is probably the best design though), and removed the autorelease so it compiles in an ARC project.
Included here in case anybody finds it useful.
.h
.m
A quick and "dirty" (removes everything between < and >) solution, works with iOS >= 3.2:
I have this declared as a category os NSString.
This is the modernization of m.kocikowski answer which removes whitespaces: