I'm interested in setting some text into a UILabel
, and depending on the directionality of the language (e.g., Hebrew - right-to-left [RTL], English - left-to-right [LTR]) set the alignment of the UILabel
.
Note that using iOS 6's NSTextAlignmentNatural
does not solve the problem, as it chooses alignment according to the current locale, experiments show.
In HTML5 this can be done by applying dir="auto" to the element. It's implemented in WebKit, though I'm not completely sure that it's available in iOS.
dir="auto" is very simple, and you can probably implement it yourself - just search for the first character that has strong directionality, and apply its directionality to the whole thing.
If you can't find anything in iOS, you can try to take some smarter ideas from the way StatusNet implemented it: http://status.net/sites/default/files/issues/1346_jquery.directionDetector.js
There is a solution that is based on the language detection of the string.
Method returns
NSTextAlignmentNatural
if direction can not be identified.Ended up following the advice given in this SO answer: write a short script that will parse the Unicode data publicly available here, and generate code to identify whether a code-point has a strong R or AL directionality attribute. Then, the string is searched for the first such character. This is exactly what ubidi_getBaseDirection from the ICU package does.
Since the internal representation of
NSString
is UTF16 (which is a variable-length encoding), it is first converted to UTF32 in order to simplify the scanning code. An alternative approach would be to decode the string on the fly, which requires dealing with BOM and Unicode surrogates. Yet another approach is simply ignoring characters not representable by oneunichar
. For more details, see Wikipedia's UTF16 article.Short Answer
Longer answer: Generating the functions
isCodePointStrong{RTL,LTR}
Create a script
hex_numbers_to_dec_ranges_py
,(code shamelessly stolen from this excellent answer at StackExhange's Code Review).
Run from a terminal:
EDIT: As @masmor correctly noted, the
for
loop ingetBaseDirection
scans characters, and not bytes. Therefore, it should terminate after "character" number of iterations, and not "bytes" number of iterations. In other words,self.length
times and notutf32data.length
times. The code is now corrected.I think the scan loop in
-getBaseDirection
should be forself.length
times instead ofutf32data.length
in the accepted answer.utf32data.length
is the size in bytes whilesizeof(UTF32Char) == 4
, which would result in an overrun.In action, the current code sporadically returns false positives for identical input, depending on what it overruns onto (maybe it'll segfault on a sufficiently bad day). Everything else seems to be working perfectly with the fix.