(# ゚Д゚) is a 5-letter-word. But in iOS, [@"(# ゚Д゚)" length] is 7.
Why?
I'm using <UITextInput>
to modify the text in a UITextField
or UITextView
. When I make a UITextRange of 5 character length, it can just cover the (# ゚Д゚) . So, why this (# ゚Д゚) looks like a 5-character-word in UITextField
and UITextView
, but looks like a 7-character-word in NSString???
How can I get the correct length of a string in this case?
1) As many in the comments have already stated, Your string is made of 5 composed character sequences (or character clusters if you prefer). When broken down by unichar
s as NSString
’s length
method does you will get a 7 which is the number of unichar
s it takes to represent your string in memory.
2) Apparently the UITextField
and UITextView
are handling the strings in a unichar savvy way. Good news, so can you. See #3.
3) You can get the number of composed character sequences by using some of the NSString
API which properly deals with composed character sequences. A quick example I baked up, very quickly, is a small NSString
category:
@implementation NSString (ComposedCharacterSequences_helper)
-(NSUInteger)numberOfComposedCharacterSequences{
__block NSUInteger count = 0;
[self enumerateSubstringsInRange:NSMakeRange(0, self.length)
options:NSStringEnumerationByComposedCharacterSequences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop){
NSLog(@"%@",substring); // Just for fun
count++;
}];
return count;
}
@end
Again this is quick code; but it should get you started. And if you use it like so:
NSString *string = @"(# ゚Д゚)";
NSLog(@"string length %i", string.length);
NSLog(@"composed character count %i", [string numberOfComposedCharacterSequences]);
You will see that you get the desired result.
For an in-depth explanation of the NSString
API check out the WWDC 2012 Session 215 Video "Text and Linguistic Analysis"
Both ゚
and Д゚
are represented by a character sequence of two Unicode characters (even when they are visually presented as one). -[NSString length]
reports the number of Unicode chars:
The number returned includes the individual characters of composed
character sequences, so you cannot use this method to determine if a
string will be visible when printed or how long it will appear.
If you want to see the byte representation:
#import <Foundation/Foundation.h>
NSString* describeUnicodeCharacters(NSString* str)
{
NSMutableString* codePoints = [NSMutableString string];
for(NSUInteger i = 0; i < [str length]; ++i){
long ch = (long)[str characterAtIndex:i];
[codePoints appendFormat:@"%0.4lX ", ch];
}
return codePoints;
}
int main(int argc, char *argv[]) {
@autoreleasepool {
NSString *s = @" ゚Д゚";
NSLog(@"%ld unicode chars. bytes: %@",
[s length], describeUnicodeCharacters(s));
}
}
The output is: 4 unicode chars. bytes: 0020 FF9F 0414 FF9F
.
2) and 3): what NJones said.