Background
The C99 standard, section 7.11, describes the <locale.h>
header and its contents. In particular, it defines struct lconv
and says that:
[...] In the "C" locale, the members shall have the values specified in the comments.
char *decimal_point; // "." char *thousands_sep; // "" char *grouping; // "" char *mon_decimal_point; // "" char *mon_thousands_sep; // "" char *mon_grouping; // "" char *positive_sign; // "" char *negative_sign; // "" char *currency_symbol; // "" char frac_digits; // CHAR_MAX char p_cs_precedes; // CHAR_MAX char n_cs_precedes; // CHAR_MAX char p_sep_by_space; // CHAR_MAX char n_sep_by_space; // CHAR_MAX char p_sign_posn; // CHAR_MAX char n_sign_posn; // CHAR_MAX char *int_curr_symbol; // "" char int_frac_digits; // CHAR_MAX char int_p_cs_precedes; // CHAR_MAX char int_n_cs_precedes; // CHAR_MAX char int_p_sep_by_space; // CHAR_MAX char int_n_sep_by_space; // CHAR_MAX char int_p_sign_posn; // CHAR_MAX char int_n_sign_posn; // CHAR_MAX
Section 7.11.2.1 "The localeconv() function" goes on to say:
The members of the structure with type
char *
are pointers to strings, any of which (exceptdecimal_point
) can point to""
, to indicate that the value is not available in the current locale or is of zero length. [...] The members with type char are nonnegative numbers, any of which can beCHAR_MAX
to indicate that the value is not available in the current locale.
It goes on to discuss each of the members. You can see 4 groups of 3 members, one representative group being p_cs_precedes
, p_sep_by_space
and p_sign_posn
.
char p_cs_precedes
Set to 1 or 0 if the currency_symbol respectively precedes or succeeds the value for a nonnegative locally formatted monetary quantity.
char p_sep_by_space
Set to a value indicating the separation of the currency_symbol, the sign string, and the value for a nonnegative locally formatted monetary quantity.
char p_sign_posn
Set to a value indicating the positioning of the positive_sign for a nonnegative locally formatted monetary quantity.
The details of the interpretation of p_sign_posn
are given; they are not material to this question.
The standard also gives some examples of how to interpret these types.
If you find the original C99 standard (ISO/IEC 9899:1999) be aware that both TC1 (International Standard ISO/IEC 9899:1999 Technical Corrigendum 1, published 2001-09-01) and TC2 (International Standard ISO/IEC 9899:1999 Technical Corrigendum 2, published 2004-11-15) make changes to §7.11.2.1 (but TC3 does not). However, the changes neither address nor affect the answers to the questions I'm about to ask.
Questions
My first two questions are about the four triples (cs_precedes, sep_by_space, and sign_posn), and the others more general questions about what constitutes a valid locale:
- Is it feasible or sensible to have one or two of the members of a triple with the CHAR_MAX designation while the other members have values in the normal range (0-1, 0-1, 0-4)?
If it is sensible, how should the combinations be interpreted?
Two combinations (all values set to
CHAR_MAX
, as in the"C"
locale, and all values set validly) are defined; it is the other 6 hybrid settings that I'm curious about.Is a locale properly formed if the triples are defined but the relevant currency symbol is not?
- Is a locale properly formed if the monetary decimal point is not defined but the currency symbol is defined.
- If the sign position is not 0 (indicating that a value is surround by parentheses), is a locale properly formed if the currency symbol is set but both the positive and negative sign strings are empty?
- Does it make sense for the positive triple to be defined when the negative triple is not?
My inclination is to answer:
- No; either all or none of the members of a triple should be set to CHAR_MAX.
- Not applicable given the answer to (1).
- No.
- No (but there is a borderline case for the old Italian currency (lire) where there were no fractions and so no decimal point was needed; that could be handled with a condition that the monetary decimal point is only needed if
frac_digits
orint_frac_digits
is greater than zero). - No.
- No.
An implementation might then enforce these rules, but it is conceivable that another implementation would interpret the rules differently and come to a different conclusion.
What say you?
Formal Constraints
As far as I can tell, neither Standard C nor POSIX lays down any rules about what is and is not valid in a
struct lconv
. One plausible reason for this is that no function in Standard C or POSIX takes astruct lconv
as an argument; only thelocaleconv()
function returns the structure:Therefore, since the implementation is nominally the only source of
struct lconv
values, whatever the implementation does must be OK in the eyes of the implementation. All in all, it is somewhat a still-born feature; it provides functionality that nothing uses directly. Behind the scenes, though, there is support for parts of this information (thinkprintf()
andscanf()
et al, for starters). The monetary information is not used by any Standard C functions. They (the<locale.h>
header and thelocaleconv()
andsetlocale()
functions) were added to C89 by the committee, in part to ensure that there could be a single ISO standard for C that would be the same as the ANSI standard for C.Plauger's book 'The Standard C Library' (which implements a C89 standard library) provides a function called
_Fmtval()
which can be used to format international currency, national (local) currency, and numbers using the conventions of the current locale, but once again, the structure used is defined by the implementation and is not provided by the user.POSIX does provide a pair of functions
strfmon()
andstrfmon_l()
, the latter of which takes alocale_t
as one of the arguments.However, POSIX says nothing at all about the contents of the type
locale_t
, though it does provide the following functions to manipulate them in limited ways:locale_t duplocale(locale_t)
void freelocale(locale_t)
locale_t newlocale(int, const char *, locale_t)
locale_t uselocale (locale_t)
However, these provide a minimal and hands-off approach to manipulating locales, and definitely do not go into details about what might or might not be acceptable in a
struct lconv
. There are also thenl_langinfo()
functions:These allow you to find out, one item at a time, the values of parts of the locale, using the names such as
ABDAY_1
to find out the abbreviated name of day 1, which is 'Sun' in English-speaking locales. There are some 55 such names in<langinfo.h>
. Interestingly, the set is not complete; you can't find the international currency symbol this way.Practical Constraints
Given that the two primary relevant standards say nothing about the constraints on the contents of
struct lconv
, we are left with trying to determine practical constraints.(Aside: given the symmetry of the national and international formatting information in the C99 standard, it is a pity in some respects that a structure wasn't used to encode the information; it makes for fiddly code picking the right bits and pieces out into generic functions. Some of the fields (
cs_precedes
,sep_by_space
) could be booleans, too, but<stdbool.h>
wasn't in C89.)Restating the questions:
The original, outline answers were:
Having spent some time implementing code to handle formatting like this, my original answers stand largely correct, in my view.
The code I ended up implementing to validate the locales was:
The standard says that
loc->int_curr_symbol[3]
is used as the 'space' character when formatting international currency, and it makes little sense to allow an alphabetic character as well as the ISO 4217 international currency code, which is three upper case letters from the basic alphabet. Allowing a digit there could lead to confusion if the sign is separate, too, so I think the!isalnum(loc->int_curr_symbol[3])
assertion is sensible. A strict check would validate that the international currency symbol is one of those listed in ISO 4217; that is a bit tricky to code, though!