What are the formal and practical constraints on t

2019-02-16 09:19发布

Background

The C99 standard, section 7.11, describes the <locale.h> header and its contents. In particular, it defines struct lconv and says that:

[...] In the "C" locale, the members shall have the values specified in the comments.

char *decimal_point;     // "."
char *thousands_sep;     // ""
char *grouping;          // ""
char *mon_decimal_point; // ""
char *mon_thousands_sep; // ""
char *mon_grouping;      // ""
char *positive_sign;     // ""
char *negative_sign;     // ""
char *currency_symbol;   // ""
char frac_digits;        // CHAR_MAX
char p_cs_precedes;      // CHAR_MAX
char n_cs_precedes;      // CHAR_MAX
char p_sep_by_space;     // CHAR_MAX
char n_sep_by_space;     // CHAR_MAX
char p_sign_posn;        // CHAR_MAX
char n_sign_posn;        // CHAR_MAX
char *int_curr_symbol;   // ""
char int_frac_digits;    // CHAR_MAX
char int_p_cs_precedes;  // CHAR_MAX
char int_n_cs_precedes;  // CHAR_MAX
char int_p_sep_by_space; // CHAR_MAX
char int_n_sep_by_space; // CHAR_MAX
char int_p_sign_posn;    // CHAR_MAX
char int_n_sign_posn;    // CHAR_MAX

Section 7.11.2.1 "The localeconv() function" goes on to say:

The members of the structure with type char * are pointers to strings, any of which (except decimal_point) can point to "", to indicate that the value is not available in the current locale or is of zero length. [...] The members with type char are nonnegative numbers, any of which can be CHAR_MAX to indicate that the value is not available in the current locale.

It goes on to discuss each of the members. You can see 4 groups of 3 members, one representative group being p_cs_precedes, p_sep_by_space and p_sign_posn.

char p_cs_precedes
Set to 1 or 0 if the currency_symbol respectively precedes or succeeds the value for a nonnegative locally formatted monetary quantity.

char p_sep_by_space
Set to a value indicating the separation of the currency_symbol, the sign string, and the value for a nonnegative locally formatted monetary quantity.

char p_sign_posn Set to a value indicating the positioning of the positive_sign for a nonnegative locally formatted monetary quantity.

The details of the interpretation of p_sign_posn are given; they are not material to this question.

The standard also gives some examples of how to interpret these types.

If you find the original C99 standard (ISO/IEC 9899:1999) be aware that both TC1 (International Standard ISO/IEC 9899:1999 Technical Corrigendum 1, published 2001-09-01) and TC2 (International Standard ISO/IEC 9899:1999 Technical Corrigendum 2, published 2004-11-15) make changes to §7.11.2.1 (but TC3 does not). However, the changes neither address nor affect the answers to the questions I'm about to ask.


Questions

My first two questions are about the four triples (cs_precedes, sep_by_space, and sign_posn), and the others more general questions about what constitutes a valid locale:

  1. Is it feasible or sensible to have one or two of the members of a triple with the CHAR_MAX designation while the other members have values in the normal range (0-1, 0-1, 0-4)?
  2. If it is sensible, how should the combinations be interpreted?

    Two combinations (all values set to CHAR_MAX, as in the "C" locale, and all values set validly) are defined; it is the other 6 hybrid settings that I'm curious about.

  3. Is a locale properly formed if the triples are defined but the relevant currency symbol is not?

  4. Is a locale properly formed if the monetary decimal point is not defined but the currency symbol is defined.
  5. If the sign position is not 0 (indicating that a value is surround by parentheses), is a locale properly formed if the currency symbol is set but both the positive and negative sign strings are empty?
  6. Does it make sense for the positive triple to be defined when the negative triple is not?

My inclination is to answer:

  1. No; either all or none of the members of a triple should be set to CHAR_MAX.
  2. Not applicable given the answer to (1).
  3. No.
  4. No (but there is a borderline case for the old Italian currency (lire) where there were no fractions and so no decimal point was needed; that could be handled with a condition that the monetary decimal point is only needed if frac_digits or int_frac_digits is greater than zero).
  5. No.
  6. No.

An implementation might then enforce these rules, but it is conceivable that another implementation would interpret the rules differently and come to a different conclusion.

What say you?

标签: c locale
1条回答
走好不送
2楼-- · 2019-02-16 09:39

Formal Constraints

As far as I can tell, neither Standard C nor POSIX lays down any rules about what is and is not valid in a struct lconv. One plausible reason for this is that no function in Standard C or POSIX takes a struct lconv as an argument; only the localeconv() function returns the structure:

 struct lconv *localeconv(void);

Therefore, since the implementation is nominally the only source of struct lconv values, whatever the implementation does must be OK in the eyes of the implementation. All in all, it is somewhat a still-born feature; it provides functionality that nothing uses directly. Behind the scenes, though, there is support for parts of this information (think printf() and scanf() et al, for starters). The monetary information is not used by any Standard C functions. They (the <locale.h> header and the localeconv() and setlocale() functions) were added to C89 by the committee, in part to ensure that there could be a single ISO standard for C that would be the same as the ANSI standard for C.

Plauger's book 'The Standard C Library' (which implements a C89 standard library) provides a function called _Fmtval() which can be used to format international currency, national (local) currency, and numbers using the conventions of the current locale, but once again, the structure used is defined by the implementation and is not provided by the user.

POSIX does provide a pair of functions strfmon() and strfmon_l(), the latter of which takes a locale_t as one of the arguments.

ssize_t strfmon(char *restrict s, size_t maxsize, const char *restrict format, ...);
ssize_t strfmon_l(char *restrict s, size_t maxsize, locale_t locale,
                  const char *restrict format, ...);

However, POSIX says nothing at all about the contents of the type locale_t, though it does provide the following functions to manipulate them in limited ways:

However, these provide a minimal and hands-off approach to manipulating locales, and definitely do not go into details about what might or might not be acceptable in a struct lconv. There are also the nl_langinfo() functions:

#include <langinfo.h>

char *nl_langinfo(nl_item item);
char *nl_langinfo_l(nl_item item, locale_t locale);

These allow you to find out, one item at a time, the values of parts of the locale, using the names such as ABDAY_1 to find out the abbreviated name of day 1, which is 'Sun' in English-speaking locales. There are some 55 such names in <langinfo.h>. Interestingly, the set is not complete; you can't find the international currency symbol this way.

Practical Constraints

Given that the two primary relevant standards say nothing about the constraints on the contents of struct lconv, we are left with trying to determine practical constraints.

(Aside: given the symmetry of the national and international formatting information in the C99 standard, it is a pity in some respects that a structure wasn't used to encode the information; it makes for fiddly code picking the right bits and pieces out into generic functions. Some of the fields (cs_precedes, sep_by_space) could be booleans, too, but <stdbool.h> wasn't in C89.)

Restating the questions:

My first two questions are about the four triples (cs_precedes, sep_by_space, and sign_posn), and the others more general questions about what constitutes a valid locale:

  1. Is it feasible or sensible to have one or two of the members of a triple with the CHAR_MAX designation while the other members have values in the normal range (0-1, 0-1, 0-4)?
  2. If it is sensible, how should the combinations be interpreted?
  3. Is a locale properly formed if the triples are defined but the relevant currency symbol is not?
  4. Is a locale properly formed if the monetary decimal point is not defined but the currency symbol is defined.
  5. If the sign position is not 0 (indicating that a value is surround by parentheses), is a locale properly formed if the currency symbol is set but both the positive and negative sign strings are empty?
  6. Does it make sense for the positive triple to be defined when the negative triple is not?

The original, outline answers were:

  1. No; either all or none of the members of a triple should be set to CHAR_MAX.
  2. Not applicable given the answer to (1).
  3. No.
  4. No (but there is a borderline case for the old Italian currency (lire) where there were no fractions and so no decimal point was needed; that could be handled with a condition that the monetary decimal point is only needed if frac_digits or int_frac_digits is greater than zero).
  5. No.
  6. No.

Having spent some time implementing code to handle formatting like this, my original answers stand largely correct, in my view.

The code I ended up implementing to validate the locales was:

/* Locale validation */
#define VALUE_IN_RANGE(v, mn, mx) ((v) >= (mn) && (v) <= (mx))
#define ASSERT(condition)           do { assert(condition); \
                                         if (!(condition)) \
                                             return false; \
                                       } while (0)
#define ASSERT_RANGE(v, mn, mx)     ASSERT(VALUE_IN_RANGE(v, mn, mx))

static bool check_decpt_thous_group(bool decpt_is_opt, const char *decpt,
                                    const char *thous, const char *group)
{
    /* Decimal point must be defined; monetary decimal point might not be */
    ASSERT(decpt != 0);
    ASSERT(decpt_is_opt || *decpt != '\0');
    /* Thousands separator and grouping must be valid (non-null) pointers */
    ASSERT(thous != 0 && group != 0);
    /* Thousands separator should be set iff grouping is set and vice versa */
    ASSERT((*thous != '\0' && *group != '\0') ||
           (*thous == '\0' && *group == '\0'));
    /* Thousands separator, if set, should be different from decimal point */
    ASSERT(*thous == '\0' || decpt_is_opt ||
          (*decpt != '\0' && strcmp(thous, decpt) != 0));
    return true;
}

static bool currency_valid(const char *currency_symbol, char frac_digits,
                           char p_cs_precedes, char p_sep_by_space, char p_sign_posn,
                           char n_cs_precedes, char n_sep_by_space, char n_sign_posn)
{
    ASSERT(currency_symbol != 0);
    if (*currency_symbol == '\0')
    {
        ASSERT(frac_digits    == CHAR_MAX);
        ASSERT(p_cs_precedes  == CHAR_MAX);
        ASSERT(p_sep_by_space == CHAR_MAX);
        ASSERT(p_sign_posn    == CHAR_MAX);
        ASSERT(n_cs_precedes  == CHAR_MAX);
        ASSERT(n_sep_by_space == CHAR_MAX);
        ASSERT(n_sign_posn    == CHAR_MAX);
    }
    else
    {
        ASSERT_RANGE(frac_digits,    0, 9);     // 9 dp of currency is a lot!
        ASSERT_RANGE(p_cs_precedes,  0, 1);
        ASSERT_RANGE(p_sep_by_space, 0, 2);
        ASSERT_RANGE(p_sign_posn,    0, 4);
        ASSERT_RANGE(n_cs_precedes,  0, 1);
        ASSERT_RANGE(n_sep_by_space, 0, 2);
        ASSERT_RANGE(n_sign_posn,    0, 4);
    }
    return true;
}

static bool locale_is_consistent(const struct lconv *loc)
{
    if (!check_decpt_thous_group(false, loc->decimal_point, loc->thousands_sep, loc->grouping))
        return false;
    if (!check_decpt_thous_group((loc->frac_digits == 0 || loc->frac_digits == CHAR_MAX),
                    loc->mon_decimal_point, loc->mon_thousands_sep, loc->mon_grouping))
        return false;
    /* Signs must be valid (non-null) strings */
    ASSERT(loc->positive_sign != 0 && loc->negative_sign != 0);
    /* Signs must be different or both must be empty string (and probably n_sign_posn == 0) */
    ASSERT(strcmp(loc->positive_sign, loc->negative_sign) != 0 || *loc->negative_sign == '\0');
    if (!currency_valid(loc->currency_symbol, loc->frac_digits,
                        loc->p_cs_precedes, loc->p_sep_by_space, loc->p_sign_posn,
                        loc->n_cs_precedes, loc->n_sep_by_space, loc->n_sign_posn))
        return false;
    if (!currency_valid(loc->int_curr_symbol, loc->int_frac_digits,
                        loc->int_p_cs_precedes, loc->int_p_sep_by_space, loc->int_p_sign_posn,
                        loc->int_n_cs_precedes, loc->int_n_sep_by_space, loc->int_n_sign_posn))
        return false;
    /*
    ** If set, international currency symbol must be 3 (upper-case)
    ** alphabetic characters plus non-alphanum separator
    */
    if (*loc->int_curr_symbol != '\0')
    {
        ASSERT(strlen(loc->int_curr_symbol) == 4);
        ASSERT(isupper(loc->int_curr_symbol[0]));
        ASSERT(isupper(loc->int_curr_symbol[1]));
        ASSERT(isupper(loc->int_curr_symbol[2]));
        ASSERT(!isalnum(loc->int_curr_symbol[3]));
    }
    return true;
}

The standard says that loc->int_curr_symbol[3] is used as the 'space' character when formatting international currency, and it makes little sense to allow an alphabetic character as well as the ISO 4217 international currency code, which is three upper case letters from the basic alphabet. Allowing a digit there could lead to confusion if the sign is separate, too, so I think the !isalnum(loc->int_curr_symbol[3]) assertion is sensible. A strict check would validate that the international currency symbol is one of those listed in ISO 4217; that is a bit tricky to code, though!

查看更多
登录 后发表回答