NSString : easy way to remove UTF-8 accents from a

2019-01-10 09:59发布

问题:

I want to change a sentence, for example:

Être ou ne pas être. C'était là-bas.

Would become:

Etre ou ne pas etre. C'etait la-bas.

Is there any easy way to do this with NSString? Or do I have to develop this on my own by checking each char?

回答1:

NSString *str = @"Être ou ne pas être. C'était là-bas.";
NSData *data = [str dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *newStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
NSLog(@"%@", newStr);

... or try using NSUTF8StringEncoding instead.

List of encoding types here:

https://developer.apple.com/documentation/foundation/nsstringencoding


Just FTR here's a one line way to write this great answer:

yourString = [[NSString alloc]
  initWithData:
    [yourString dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES]
  encoding:NSASCIIStringEncoding];


回答2:

Mattt Thompson covered this in NSHipster and again at WWDC 2013 session 228

TL;DR

NSMutableString *str = [@"Être ou ne pas être. C'était là-bas." mutableCopy];
CFStringTransform((__bridge CFMutableStringRef)string, NULL, kCFStringTransformStripCombiningMarks, NO);

Should do the trick, it worked great for me.

Caveat Since a lot of people in the comments say this should be the accepted answer I want to give a caveat for this method. This method is pretty damn slow and should be used with care if huge amounts of string/data needs to be transformed



回答3:

Have you tried

[string stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale currentLocale]]

or

Boolean CFStringTransform (
   CFMutableStringRef string,
   CFRange *range,
   CFStringRef transform,
   Boolean reverse
);

?

CFStringTransform & Transform Identifiers

NSMutableString *string = ...;
CFMutableStringRef stringRef = (__bridge CFMutableStringRef)string;
CFStringTransform(stringRef, NULL, kCFStringTransformToLatin, NO);
NSLog(@"%@", string);


回答4:

Just an update to say that it can be done like that in swift:

"Être ou ne pas être. C'était là-bas.".stringByFoldingWithOptions(NSStringCompareOptions.DiacriticInsensitiveSearch, locale: NSLocale.currentLocale())

--> "Etre ou ne pas etre. C'etait la-bas."



回答5:

Here a Performance Test using Swift 2.0 on iPhone 6 iOS 9.0 Simulator between solutions using:

  • CFStringTransform (Task 1)
  • stringByFoldingWithOptions (Task 2)

Task 2 is consistently faster, e.g.:

Task 1 took 9.49736100435257 seconds.
Task 2 took 1.96649599075317 seconds.

Here the test:

    let timer = ParkBenchTimer()
    for _ in 1...1000000 {
        let mStringRef = NSMutableString(string: "Être ou ne pas être. C'était là-bas.") as CFMutableStringRef
        CFStringTransform(mStringRef, nil, kCFStringTransformStripCombiningMarks, false)
        String(mStringRef)
    }
    print("Task 1 took \(timer.stop()) seconds.")

    let timer2 = ParkBenchTimer()
    for _ in 1...1000000 {
        "Être ou ne pas être. C'était là-bas.".stringByFoldingWithOptions(NSStringCompareOptions.DiacriticInsensitiveSearch, locale: NSLocale.currentLocale())
    }
    print("Task 2 took \(timer2.stop()) seconds.")

ParkBenchTimer by Klaas: https://stackoverflow.com/a/26578191/1097106



回答6:

Swift 3 (tested in playground)

//String+StripCombiningMarks.swift

extension String {
    /// strip combining marks (accents or diacritics)
    var stripCombiningMarks: String {
        let mStringRef = NSMutableString(string: self) as CFMutableString
        CFStringTransform(mStringRef, nil, kCFStringTransformStripCombiningMarks, false)
        return mStringRef as String
    }
}

Usage:

let umlaut = "äöüÄÖÜ" //ÄÖÜ
let stripped = umlaut.stripCombiningMarks //aouAOU


回答7:

here is complete code. use function stringbyfoldignWithOptions.

NSString *str1=@"Être ou ne pas être C'était là-bas"; NSString *str2=[str1 stringByFoldingWithOptions:NSDiacriticInsensitiveSearch locale:[NSLocale systemLocale]]; NSLog(@"%@",str2);



回答8:

For those who want a Swift version of CFStringTransform solution:

let stripAccentAndDiacritics: (String) -> String = {
    var mStringRef = NSMutableString(string: $0) as CFMutableStringRef
    CFStringTransform(mStringRef, nil, kCFStringTransformStripCombiningMarks, Boolean(0))
    return String(mStringRef)
}