Objective C HTML escape/unescape

2018-12-31 18:09发布

问题:

Wondering if there is an easy way to do a simple HTML escape/unescape in Objective C. What I want is something like this psuedo code:

NSString *string = @\"<span>Foo</span>\";
[string stringByUnescapingHTML];

Which returns

<span>Foo</span>

Hopefully unescaping all other HTML entities as well and even ASCII codes like Ӓ and the like.

Is there any methods in Cocoa Touch/UIKit to do this?

回答1:

This link contains the solution below. Cocoa CF has the CFXMLCreateStringByUnescapingEntities function but that\'s not available on the iPhone.

@interface MREntitiesConverter : NSObject <NSXMLParserDelegate>{
    NSMutableString* resultString;
}

@property (nonatomic, retain) NSMutableString* resultString;

- (NSString*)convertEntitiesInString:(NSString*)s;

@end


@implementation MREntitiesConverter

@synthesize resultString;

- (id)init
{
    if([super init]) {
        resultString = [[NSMutableString alloc] init];
    }
    return self;
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s {
        [self.resultString appendString:s];
}

- (NSString*)convertEntitiesInString:(NSString*)s {
    if (!s) {
        NSLog(@\"ERROR : Parameter string is nil\");
    }
    NSString* xmlStr = [NSString stringWithFormat:@\"<d>%@</d>\", s];
    NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
    NSXMLParser* xmlParse = [[[NSXMLParser alloc] initWithData:data] autorelease];
    [xmlParse setDelegate:self];
    [xmlParse parse];
    return [NSString stringWithFormat:@\"%@\",resultString];
}

- (void)dealloc {
    [resultString release];
    [super dealloc];
}

@end


回答2:

Check out my NSString category for XMLEntities. There\'s methods to decode XML entities (including all HTML character references), encode XML entities, stripping tags and removing newlines and whitespace from a string:

- (NSString *)stringByStrippingTags;
- (NSString *)stringByDecodingXMLEntities; // Including all HTML character references
- (NSString *)stringByEncodingXMLEntities;
- (NSString *)stringWithNewLinesAsBRs;
- (NSString *)stringByRemovingNewLinesAndWhitespace;


回答3:

Another HTML NSString category from Google Toolbox for Mac
Despite the name, this works on iOS too.

http://google-toolbox-for-mac.googlecode.com/svn/trunk/Foundation/GTMNSString+HTML.h

/// Get a string where internal characters that are escaped for HTML are unescaped 
//
///  For example, \'&amp;\' becomes \'&\'
///  Handles &#32; and &#x32; cases as well
///
//  Returns:
//    Autoreleased NSString
//
- (NSString *)gtm_stringByUnescapingFromHTML;

And I had to include only three files in the project: header, implementation and GTMDefines.h.



回答4:

This is an incredibly hacked together solution I did, but if you want to simply escape a string without worrying about parsing, do this:

-(NSString *)htmlEntityDecode:(NSString *)string
    {
        string = [string stringByReplacingOccurrencesOfString:@\"&quot;\" withString:@\"\\\"\"];
        string = [string stringByReplacingOccurrencesOfString:@\"&apos;\" withString:@\"\'\"];
        string = [string stringByReplacingOccurrencesOfString:@\"&lt;\" withString:@\"<\"];
        string = [string stringByReplacingOccurrencesOfString:@\"&gt;\" withString:@\">\"];
        string = [string stringByReplacingOccurrencesOfString:@\"&amp;\" withString:@\"&\"]; // Do this last so that, e.g. @\"&amp;lt;\" goes to @\"&lt;\" not @\"<\"

        return string;
    }

I know it\'s by no means elegant, but it gets the job done. You can then decode an element by calling:

string = [self htmlEntityDecode:string];

Like I said, it\'s hacky but it works. IF you want to encode a string, just reverse the stringByReplacingOccurencesOfString parameters.



回答5:

In iOS 7 you can use NSAttributedString\'s ability to import HTML to convert HTML entities to an NSString.

Eg:

@interface NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString;
@end

@implementation NSAttributedString (HTML)
+ (instancetype)attributedStringWithHTMLString:(NSString *)htmlString
{
    NSDictionary *options = @{ NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType,
                               NSCharacterEncodingDocumentAttribute :@(NSUTF8StringEncoding) };

    NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding];

    return [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil];
}

@end

Then in your code when you want to clean up the entities:

NSString *cleanString = [[NSAttributedString attributedStringWithHTMLString:question.title] string];

This is probably the simplest way, but I don\'t know how performant it is. You should probably be pretty damn sure the content your \"cleaning\" doesn\'t contain any <img> tags or stuff like that because this method will download those images during the HTML to NSAttributedString conversion. :)



回答6:

Here\'s a solution that neutralizes all characters (by making them all HTML encoded entities for their unicode value)... Used this for my need (making sure a string that came from the user but was placed inside of a webview couldn\'t have any XSS attacks):

Interface:

@interface NSString (escape)
- (NSString*)stringByEncodingHTMLEntities;
@end

Implementation:

@implementation NSString (escape)

- (NSString*)stringByEncodingHTMLEntities {
    // Rather then mapping each individual entity and checking if it needs to be replaced, we simply replace every character with the hex entity

    NSMutableString *resultString = [NSMutableString string];
    for(int pos = 0; pos<[self length]; pos++)
        [resultString appendFormat:@\"&#x%x;\",[self characterAtIndex:pos]];
    return [NSString stringWithString:resultString];
}

@end

Usage Example:

UIWebView *webView = [[UIWebView alloc] init];
NSString *userInput = @\"<script>alert(\'This is an XSS ATTACK!\');</script>\";
NSString *safeInput = [userInput stringByEncodingHTMLEntities];
[webView loadHTMLString:safeInput baseURL:nil];

Your mileage will vary.



回答7:

The least invasive and most lightweight way to encode and decode HTML or XML strings is to use the GTMNSStringHTMLAdditions CocoaPod.

It is simply the Google Toolbox for Mac NSString category GTMNSString+HTML, stripped of the dependency on GTMDefines.h. So all you need to add is one .h and one .m, and you\'re good to go.

Example:

#import \"GTMNSString+HTML.h\"

// Encoding a string with XML / HTML elements
NSString *stringToEncode = @\"<TheBeat>Goes On</TheBeat>\";
NSString *encodedString = [stringToEncode gtm_stringByEscapingForHTML];

// encodedString looks like this now:
// &lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;

// Decoding a string with XML / HTML encoded elements
NSString *stringToDecode = @\"&lt;TheBeat&gt;Goes On&lt;/TheBeat&gt;\";
NSString *decodedString = [stringToDecode gtm_stringByUnescapingFromHTML];

// decodedString looks like this now:
// <TheBeat>Goes On</TheBeat>


回答8:

This is an easy to use NSString category implementation:

  • http://code.google.com/p/qrcode-scanner-live/source/browse/trunk/iphone/Classes/NSString%2BHTML.h
  • http://code.google.com/p/qrcode-scanner-live/source/browse/trunk/iphone/Classes/NSString%2BHTML.m

It is far from complete but you can add some missing entities from here: http://code.google.com/p/statz/source/browse/trunk/NSString%2BHTML.m

Usage:

#import \"NSString+HTML.h\"

NSString *raw = [NSString stringWithFormat:@\"<div></div>\"];
NSString *escaped = [raw htmlEscapedString];


回答9:

The MREntitiesConverter above is an HTML stripper, not encoder.

If you need an encoder, go here: Encode NSString for XML/HTML



回答10:

MREntitiesConverter doesn\'t work for escaping malformed xml. It will fail on a simple URL:

http://www.google.com/search?client=safari&rls=en&q=fail&ie=UTF-8&oe=UTF-8



回答11:

If you need to generate a literal you might consider using a tool like this:

http://www.freeformatter.com/java-dotnet-escape.html#ad-output

to accomplish the work for you.

See also this answer.



回答12:

This easiest solution is to create a category as below:

Here’s the category’s header file:

#import <Foundation/Foundation.h>
@interface NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding;
@end

And here’s the implementation:

#import \"NSString+URLEncoding.h\"
@implementation NSString (URLEncoding)
-(NSString *)urlEncodeUsingEncoding:(NSStringEncoding)encoding {
    return (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL,
               (CFStringRef)self,
               NULL,
               (CFStringRef)@\"!*\'\\\"();:@&=+$,/?%#[]% \",
               CFStringConvertNSStringEncodingToEncoding(encoding));
}
@end

And now we can simply do this:

NSString *raw = @\"hell & brimstone + earthly/delight\";
NSString *url = [NSString stringWithFormat:@\"http://example.com/example?param=%@\",
            [raw urlEncodeUsingEncoding:NSUTF8Encoding]];
NSLog(url);

The credits for this answer goes to the website below:-

http://madebymany.com/blog/url-encoding-an-nsstring-on-ios


回答13:

Why not just using ?

NSData *data = [s dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES];
NSString *result = [[[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding] autorelease];
return result;

Noob question but in my case it works...



回答14:

This is an old answer that I posted some years ago. My intention was not to provide a \"good\" and \"respectable\" solution, but a \"hacky\" one that might be useful under some circunstances. Please, don\'t use this solution unless nothing else works.

Actually, it works perfectly fine in many situations that other answers don\'t because the UIWebView is doing all the work. And you can even inject some javascript (which can be dangerous and/or useful). The performance should be horrible, but actually is not that bad.

There is another solution that has to be mentioned. Just create a UIWebView, load the encoded string and get the text back. It escapes tags \"<>\", and also decodes all html entities (e.g. \"&gt;\") and it might work where other\'s don\'t (e.g. using cyrillics). I don\'t think it\'s the best solution, but it can be useful if the above solutions doesn\'t work.

Here is a small example using ARC:

@interface YourClass() <UIWebViewDelegate>

    @property UIWebView *webView;

@end

@implementation YourClass 

- (void)someMethodWhereYouGetTheHtmlString:(NSString *)htmlString {
    self.webView = [[UIWebView alloc] init];
    NSString *htmlString = [NSString stringWithFormat:@\"<html><body>%@</body></html>\", self.description];
    [self.webView loadHTMLString:htmlString baseURL:nil];
    self.webView.delegate = self;
}

- (void)webView:(UIWebView *)webView didFailLoadWithError:(NSError *)error {
    self.webView = nil;
}

- (void)webViewDidFinishLoad:(UIWebView *)webView {
    self.webView = nil;
    NSString *escapedString = [self.webView stringByEvaluatingJavaScriptFromString:@\"document.body.textContent;\"];
}

- (void)webViewDidStartLoad:(UIWebView *)webView {
    // Do Nothing
}

@end