fixed CHAR_BIT on various systems?

2020-04-11 18:03发布

I am confused about CHAR_BIT in limits.h. I have read some articles saying the macro CHAR_BIT is there for portability. To use the macro not a magic number like 8 in code, this is reasonable. But limits.h is from glibc-headers and it's value is fixed as 8. If glibc-headers is installed on a system on which a byte has more than 8 bits (say 16 bits), is that wrong when compiling? A 'char' is assigned 8 bits or 16 bits?

And when I modified CHAR_BIT to 9 in limits.h, the following code still prints '8', how?

#include <stdio.h>
#include <limits.h>

int
main(int argc, char **argv)
{
    printf("%d\n", CHAR_BIT);
    return 0;
}

The following is supplementary: I've read all replies so for, but still not clear. In practice, #include <limits.h> and use CHAR_BIT, I can obey that. But that's another thing. Here I want to know why it appears that way, first it is a fixed value '8' in glibc /usr/include/limits.h, what happens when those systems which has 1 byte != 8 bits are installed with glibc; then I found the value '8' is not even the real value the code is using, so '8' means nothing there? Why put '8' there if the value is not used at all?

Thanks,

标签: char bit glibc
3条回答
干净又极端
2楼-- · 2020-04-11 18:40

Diving into system header files can be a daunting and unpleasant experience. glibc header files can easily create a lot of confusion in your head, because they include other system header files under certain circumstances that override what has been defined so far.

In the case of limits.h, if you read the header file carefully, you will find that the definition for CHAR_BIT is only used when you compile code without gcc, since this line:

#define CHAR_BIT 8

Is inside an if condition a few lines above:

/* If we are not using GNU CC we have to define all the symbols ourself.
   Otherwise use gcc's definitions (see below).  */
#if !defined __GNUC__ || __GNUC__ < 2

Thus, if you compile your code with gcc, which is most likely the case, this definition for CHAR_BIT will not be used. That's why you change it and your code still prints the old value. Scrolling down a little bit on the header file, you can find this for the case that you're using GCC:

 /* Get the compiler's limits.h, which defines almost all the ISO constants.

    We put this #include_next outside the double inclusion check because
    it should be possible to include this file more than once and still get
    the definitions from gcc's header.  */
#if defined __GNUC__ && !defined _GCC_LIMITS_H_
/* `_GCC_LIMITS_H_' is what GCC's file defines.  */
# include_next <limits.h>

include_next is a GCC extension. You can read about what it does in this question: Why would one use #include_next in a project?

Short answer: it will search for the next header file with the name you specify (limits.h in this case), and it will include GCC's generated limits.h. In my system, it happens to be /usr/lib/gcc/i486-linux-gnu/4.7/include-fixed/limits.h.

Consider the following program:

#include <stdio.h>
#include <limits.h>

int main(void) {
  printf("%d\n", CHAR_BIT);
  return 0;
}

With this program, you can find the path for your system with the help of gcc -E, which outputs a special line for each file included (see http://gcc.gnu.org/onlinedocs/cpp/Preprocessor-Output.html)

Because #include <limits.h> is on line 2 of this program, which I named test.c, running gcc -E test.c allows me to find the real file that is being included:

# 2 "test.c" 2
# 1 "/usr/lib/gcc/i486-linux-gnu/4.7/include-fixed/limits.h" 1 3 4

You can find this in that file:

/* Number of bits in a `char'.  */
#undef CHAR_BIT
#define CHAR_BIT __CHAR_BIT__

Note the undef directive: it is needed to override any possible previous definitions. It is saying: "Forget whatever CHAR_BIT was, this is the real thing". __CHAR_BIT__ is a gcc predefined constant. GCC's online documentation describes it in the following way:

__CHAR_BIT__ Defined to the number of bits used in the representation of the char data type. It exists to make the standard header given numerical limits work correctly. You should not use this macro directly; instead, include the appropriate headers.

You can read its value with a simple program:

#include <stdio.h>
#include <limits.h>

int main(void) {
  printf("%d\n", __CHAR_BIT__);
  return 0;
}

And then running gcc -E code.c. Note that you shouldn't use this directly, as gcc's manpage mentions.

Obviously, if you change CHAR_BIT definition inside /usr/lib/gcc/i486-linux-gnu/4.7/include-fixed/limits.h, or whatever the equivalent path is in your system, you will be able to see this change in your code. Consider this simple program:

#include <stdio.h>
#include <limits.h>

int main(void) {
  printf("%d\n", CHAR_BIT);
  return 0;
}

Changing CHAR_BIT definition in gcc's limits.h (that is, the file in /usr/lib/gcc/i486-linux-gnu/4.7/include-fixed/limits.h) from __CHAR_BIT__ to 9 will make this code print 9. Again, you can stop the compilation process after preprocessing takes place; you can test it with gcc -E.

What if you're compiling code with a compiler other than gcc?

Well, then be it, default ANSI limits are assumed for standard 32-bit words. From paragraph 5.2.4.2.1 in ANSI C standard (sizes of integral types <limits.h>):

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. [...] Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

  • number of bits for smallest object that is not a bit-field (byte)

    CHAR_BIT 8

POSIX mandates that a compliant platform have CHAR_BIT == 8.

Of course, glibc's assumptions can go wrong for machines which do not have CHAR_BIT == 8, but note that you must be under an unsual architecture AND not use gcc AND your platform is not POSIX compliant. Not very likely.

Remember, however, that "implementation defined" means that the compiler writer chooses what happens. Thus, even if you're not compiling with gcc, there is a chance that your compiler has some sort of __CHAR_BIT__ equivalent defined. Even though glibc will not use it, you can do a little research and use your compiler's definition directly. This is generally bad practice - you will be writing code that is geared towards a specific compiler.

Keep in mind that you should never be messing with system header files. Very weird things can happen when you compile stuff with wrong and important constants like CHAR_BIT. Do this for educational purposes only, and always restore the original file back.

查看更多
对你真心纯属浪费
3楼-- · 2020-04-11 18:42

CHAR_BIT should never be changed for a given system. The value of CHAR_BIT specifies size in bits of the smallest addressable unit of storage (a "byte") -- so even a system that uses 16-bit characters (UCS-2 or UTF-16) will most likely have CHAR_BIT == 8.

Almost all modern systems have CHAR_BIT == 8; C implementations for some DSPs might set it to 16 or 32.

The value of CHAR_BIT doesn't control the number of bits in a byte, it documents it, and allows user code to refer to it. For example, the number of bits in an object is sizeof object * CHAR_BIT.

If you edit your system's <limits.h> file, that doesn't change the actual characteristics of the system; it just gives you an inconsistent system. It's like hacking your compiler so it defines the symbol _win32 rather than _linux; that doesn't magically change your system from Windows to Linux, it just breaks it.

CHAR_BIT is a read-only constant for each system. It's defined by the developers of the system. You don't get to change it; don't even try.

As far as I know, glibc only works on systems with 8-bit bytes. It's theoretically possible to modify it so it works on other systems, but without a lot of development work you probably wouldn't even be able to install it on a system with 16-bit bytes.

As for why hacking the limits.h file didn't change the value you got for CHAR_BIT, system headers are complicated, and not intended to be edited in place. When I compile a small file that just has #include <limits.h> on my system, it directly or indirectly includes:

/usr/include/features.h
/usr/include/limits.h
/usr/include/linux/limits.h
/usr/include/x86_64-linux-gnu/bits/local_lim.h
/usr/include/x86_64-linux-gnu/bits/posix1_lim.h
/usr/include/x86_64-linux-gnu/bits/posix2_lim.h
/usr/include/x86_64-linux-gnu/bits/predefs.h
/usr/include/x86_64-linux-gnu/bits/wordsize.h
/usr/include/x86_64-linux-gnu/gnu/stubs-64.h
/usr/include/x86_64-linux-gnu/gnu/stubs.h
/usr/include/x86_64-linux-gnu/sys/cdefs.h
/usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed/limits.h
/usr/lib/gcc/x86_64-linux-gnu/4.7/include-fixed/syslimits.h

Two of these files have #define directives for CHAR_BIT, one setting it to 8 and another to __CHAR_BIT__. I don't know (and I don't need to care) which of those definitions actually takes effect. All I need to know is that #include <limits.h> will give the a correct definition for CHAR_BIT -- as long as I don't do anything that corrupts the system.

查看更多
够拽才男人
4楼-- · 2020-04-11 18:52

The whole point is that when compiling for a system with a different size, CHAR_BIT gets changed to the correct size.

查看更多
登录 后发表回答