(# ゚Д゚) is a 5-letter-word. But in iOS, [@“(# ゚Д゚)

2019-04-08 09:26发布

问题:

(# ゚Д゚) is a 5-letter-word. But in iOS, [@"(# ゚Д゚)" length] is 7.

  1. Why?

  2. I'm using <UITextInput> to modify the text in a UITextField or UITextView. When I make a UITextRange of 5 character length, it can just cover the (# ゚Д゚) . So, why this (# ゚Д゚) looks like a 5-character-word in UITextField and UITextView, but looks like a 7-character-word in NSString???

  3. How can I get the correct length of a string in this case?

回答1:

1) As many in the comments have already stated, Your string is made of 5 composed character sequences (or character clusters if you prefer). When broken down by unichars as NSString’s length method does you will get a 7 which is the number of unichars it takes to represent your string in memory.

2) Apparently the UITextField and UITextView are handling the strings in a unichar savvy way. Good news, so can you. See #3.

3) You can get the number of composed character sequences by using some of the NSString API which properly deals with composed character sequences. A quick example I baked up, very quickly, is a small NSString category:

@implementation NSString (ComposedCharacterSequences_helper)
-(NSUInteger)numberOfComposedCharacterSequences{
    __block NSUInteger count = 0;
    [self enumerateSubstringsInRange:NSMakeRange(0, self.length)
                             options:NSStringEnumerationByComposedCharacterSequences
                          usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
                              NSLog(@"%@",substring); // Just for fun
                              count++;
                          }];
    return count;
}
@end

Again this is quick code; but it should get you started. And if you use it like so:

NSString *string = @"(# ゚Д゚)";
NSLog(@"string length %i", string.length);
NSLog(@"composed character count %i", [string numberOfComposedCharacterSequences]);

You will see that you get the desired result.

For an in-depth explanation of the NSString API check out the WWDC 2012 Session 215 Video "Text and Linguistic Analysis"



回答2:

Both and Д゚ are represented by a character sequence of two Unicode characters (even when they are visually presented as one). -[NSString length] reports the number of Unicode chars:

The number returned includes the individual characters of composed character sequences, so you cannot use this method to determine if a string will be visible when printed or how long it will appear.

If you want to see the byte representation:

#import <Foundation/Foundation.h>

NSString* describeUnicodeCharacters(NSString* str)
{
    NSMutableString* codePoints = [NSMutableString string];
    for(NSUInteger i = 0; i < [str length]; ++i){
        long ch = (long)[str characterAtIndex:i];
        [codePoints appendFormat:@"%0.4lX ", ch];
    }
    return codePoints;
}


int main(int argc, char *argv[]) {
    @autoreleasepool {
        NSString *s = @" ゚Д゚";
        NSLog(@"%ld unicode chars. bytes: %@", 
            [s length], describeUnicodeCharacters(s));
    }
}

The output is: 4 unicode chars. bytes: 0020 FF9F 0414 FF9F.

2) and 3): what NJones said.