I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.
I'm not sure if I was clear in my explanation, but I don't think I can do it any better.
Hope someone can help me.
Kind regards,
Carlos Ferreira
BTW, the strings are being obtained from the database.
EDIT
The first 2 Strings are the strings I want to concatenate and the third is the result.
EDIT 2
Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.
You can embed bidi regions using unicode format control codepoints:
So in java, to embed a RTL language like Arabic in an LTR language like English, you would do
and to do the reverse
See Bidirectional General Formatting for more details, or the Unicode specification chapter on "Directional Formatting Codes" for the source material.
It's not changing order of the codepoints. What's happening is that when it comes to display the string, it sees that the string starts with a right-to-left script, so it displays it right-to-left.
It's very likely that you need to insert Unicode directional formatting codes into your string to get your string display correctly. For details see Directional Formatting Codes of the Unicode Bidirectional Algorithm specification.
Maybe the Bidi class can help you in determining the correct sequence, as it implements the Unicode Bidirectional Algorithm.