How can i write arabic string in C++ using Codeblo

2019-09-20 19:20发布

问题:

Could someone tell me how can I write an arabic string in C++ ?? , I am using Codeblocks.

I searched all over the internet to see how can I put an arabic string in a c++ code but I didn't reach anything

回答1:

There are multiple types of string literals available for use in C++. While the most common type, the narrow multibyte string literal, looks like this:

auto myliteral = "";

there are others.

specifically, the one you might be looking for is the utf-8 string literal, written like this:

auto myutf8literal = u8"عربي";

all variations on the basic string literal are prepended with some combination of characters:

// ex.     description                       storage

L"";    // wide string literal              wchar_t[]
u8"";   // utf-8 encoded string literal     char[]
u"";    // utf-16 encoded string literal    char16_t[]
U"";    // utf-32 encoded string literal    char32_t[]

R"delim()delim";
        // raw string literal, modifier to indicate no escapes in string
        // can be combined with any of the above string literal types

and most of them can hold a far greater variety of characters than the basic string literal. u8, u and U hold the unicode character set, which is a rather massive set of characters.

Something you should be aware of is that displaying this string in environments with minimal display capabilities, rather than simply writing to a file, is going to introduce additional complications. As an example: the Windows console is heavily limited in which characters it can display, and uses something called a code page to decide how to display certain characters. It does have a utf-8 code page, though there appear to be some issues with it.

For details on dealing with Arabic console output on Linux, look at @cup's answer.



回答2:

The first thing you have to consider is where you are printing the text. If you are printing it on the console, in Linux, use acon. On Windows, change to codepage 1256 and change the font to Lucida Console.

Arabic is a cursive script and is written right to left. When you input the text as an assignment, the text you write is from left to right. So if you have

wchar_t* monday = L"الإثنين";

It is made up of the individual letters ال إ ث ن ي ن (equal to English yadnom). If you try to display it, you may get ال إ ث ن ي نor ن ي ن ث إ ل ا, depending on how it does the printing. Some systems understand right to left, some don't. This can be quite confusing as you don't really want to input "yadnom si yadot" instead of "today is monday". If you put in "today is monday" and get back "yadom si yadot" then you will need to reverse the string internally before printing it out.

The next problem is that if you put in ال إ ث ن ي ن, you may not end up with الإثنين because the system does not know about the joined up writing. If you look further down in the unicode character set, you will find that each letter has four forms.

  1. Standalone
  2. With a letter on the right and nothing on the left
  3. With a letter on the left and nothing on the right
  4. With a letter on the left and on the right.

You need to pick the right form of each letter, depending on what it has on either side. Once you do that, you will get الإثنين Check that the alef is on the right: not the left.

The next problem is where you print it. If you're using a left to right system that does not know anything about right to left scripts, the text has to be measured and positioned correctly before printing.

If you plan to do any justification, remember that Arabic does not increase the space between the words: it increases the length of the words with a special character called the kashida.



回答3:

It is related to character encoding. Most implementations use UTF-8. See http://utf8everywhere.org/

Some software libraries (e.g. both GTK & Qt) are able to display UTF-8 string with both arabic and latin sentences (e.g. changing directions).



回答4:

You console has to be in Unicode and I may have some of this wrong because I don't have codeblocks in front of me.

You can use wprintf: http://www.cplusplus.com/reference/cwchar/wprintf/

wprintf(L"Teh Isolated Form: %lc ", L'ﺕ');

Alternatively you may have to use the number for it?

wprintf(L"Teh Isolated Form: %lc ", 65173);