Numbers localization in desktop applications

2019-06-16 19:37发布

问题:

In the number decimal category of Unicode, 460 decimal characters are defined (see this page for some examples). Unfortunately I could not find any character representing a digit regardless of its appearance. As a result, currently only Western Arabic numeral characters are understood by most software as digits. So you can not for example enter other number characters in the MS Excel.

If Unicode had (at least) 10 code for digits 0 to 9 as pure numbers, not a glyph, we could use them in almost all normal usage, and host environment could show localized number glyphs according to user's locale. Also we could use any of the 460 decimal Unicode numbers when we want to work with number glyphs as a string.

On the other hand, if we accept the current characters U+0030 to u+0039 as pure digit numbers, then we need ten new character for Western Arabic numerals. This implementation seems also to be more backward compatible. Also the names of the characters U+0030 to U+0039 do not refer to any specific number's appearance.

Obviously we can hard-code all 460 decimal numeral characters in the app and internally treat with them as numbers, but I am looking for a more suitable solution. The issue becomes more complicated if we also consider 224+464 other Unicode number characters (Nl category + No category) that include Roman and Old-Persian numbers.

How can we solve this issue with an OS wide solution?

See also Numbers localization in Web applications

回答1:

I'm not exactly sure what you are asking, but the nearest thing to a specific question seems to be, "in the current situation, how should we handle numbers in mathematical applications in a manner where users can see their local number glyphs?"

Very simple: write your own mathematical application. It will have a Model of its data, for instance, an integer number or a real number. It will also have a View of that data, for instance, a character string expressing the number in a notation the user knows how to read. (These terms refer to the Model-View-Controller architecture.) In your own application, write code for your View that displays the number using Arabic number characters, or Bengali number characters, or Chinese number characters, or whatever representation you desire.

As Esailija writes, the Common Locale Data Repository (CLDR) and the International Classes for Unicode (ICU) libraries can help you write this application.

You write,

I could not find any character representing a digit regardless of its appearance. As a result, currently only Western Arabic numeral characters are understood by most (or perhaps all) software as numbers. So you can not enter other number characters in MS Excel.

I think these three sentences don't have a logical connection.

The reason you can't enter other number characters in Microsoft Excel is that Microsoft made a business decision that the Excel was useful enough if it represented numbers only with Western digits, and it was not necessary for them to build the multilingual spreadsheet you seek.

The reason currently only Western Arabic numeral characters are understood by most (or perhaps all) software as numbers is because many other software developers have made the same business decision as Microsoft. It is not because of how digits are encoded in Unicode.

You are correct that the Unicode standard has no character representing a digit regardless of its appearance. That is because the Unicode standard deals with characters, using a very detailed model of what are and are not characters. The Unicode Standard (usually) not with other abstract data model entities.

So: go and write the mathematical application which has the behaviour you want. The platform and APIs are open to you. The Unicode Standard and CLDR and ICU provide you with tools. Do great things!

You add:

Obviously we can hard-code all 460 decimal numeral characters in the app and internally treat with them as numbers, but I am looking for a more suitable solution.... How can we solve this issue with an OS wide solution?

What are your criteria for declaring a solution "suitable"? Hard-coding the decimal numeral characters, or more specifically writing a set of language specific routines to convert between abstract number data types to text representations in various languages, is the only way I see that will work. By "an OS wide solution", do you mean a solution which you can install into the OS, and it will change the behaviour of existing applications? Well, you can hope for that, but I don't think it will come to pass on current OS's.

Note that the language-specific routines could perhaps be implemented with the RuleBasedNumberFormat class of ICU. This class can format an abstract number as a string of text like '(e.g., 25,3476 as "twenty-five thousand three hundred seventy-six" or "vingt-cinq mille trois cents soixante-seize" or "fünfundzwanzigtausenddreihundertsechsundsiebzig")'. One can probably write code with this class to format numbers using any of the 46 language sets of digits you identified. However, application software would still need to incorporate ICU and the number format code.

Update: modified my answer to track wording changes in original poster's question. Added response to call for "OS wide solution". Repaired a link to Wikipedia at "Model-view-controller".

Update: deleted spurious word "the".



回答2:

You can find the numbering systems in CLDR. The id-attribute descriptions can be found in the bcp file for numbers.A Numbering system is either numeric or algorithimic, specified in the type-attribute. If it's "numeric", then the digits attribute contains digits in that system starting from 0. If it's "algorithmic", then the rules-attribute will refer to the rules used. Reading numbering system files

For the algorithimic rules for numbering systems, see the root.xml file in rbnf (Rule-based number formatting) folder. More about reading rbnf files.

The ICU libraries already implement this but you can also roll your own based on the data from above links, to convert from any numbering system characters to integers or vice versa.



回答3:

Unicode does not prescribe glyphs for characters. A character is considered to be an abstraction, independent of a specific shaping. So, in a sense, all characters are "regardless of appearance".

But to get to your question (I think), to perform this manner of localization would require a sequence of code points that represent a number to be first identified and converted to an actual number. I think no Unicode publication covers how to do this (even UTR 25 assumes Latin digits), and it's not necessarily going to be easy. For example, as noted, some code points have values outside the range 0-9, and numbers can appear left-to-right in otherwise right-to-left surrounding text.

Assuming you want to attempt this, however, you will need the Numeric Type and the Numeric Value of each code point; these are normative properties whose values are listed in UnicodeData.txt. They define the abstract value for each code point that represents a number (a number that is not necessarily a digit, mind). Once you have the abstract number, you would need to perform the reverse process of converting it to a locale-dependent sequence of code points that represents the same value.