C encoding of character constants

2019-04-07 08:39发布

My programmer's instinct would say that a character constant in c (eg: 'x') is encoded using the machine character set from the machine on which it is compiled. However, the following exerpt is from "The C Programming Language: ANSI C Edition"

"A character constant is a sequence of one or more characters enclosed in single quotes, as in 'x'. The value of a character constant with only one character is the numeric value of the charachter in the machine's character set at execution time."

Emphasis on the last 3 words.

Can anyone explain why they would say "at execution time". Surely the character value is encoded in the compiled binary (or ELF, A.OUT...) ?

I was wondering, but couldn't come up with any logical explanaition for this, surely K & R knew what they were doing!

4条回答
SAY GOODBYE
2楼-- · 2019-04-07 08:58

You will have to tell the compiler what system you are going to run the program on. It will then choose the proper encoding for the characters.

Of course, default is to run on a system similar to the one running the compiler. In that case the compile time and runtime character sets will be identical.

查看更多
▲ chillily
3楼-- · 2019-04-07 08:58

C distinguishes source character set and execution character set, because your compiler could be a cross compiler, e.g on a PC for a mobile platform. Then the character set on the computer and the one on the target machine must not agree. Simplest example is the EOL encoding, that is different between the different common platforms on the market nowadays. The execution character set may also depend on "locales" and other knobs that are dynamically set by the user running the program.

查看更多
成全新的幸福
4楼-- · 2019-04-07 09:03

In C language terms, data is encoded for a particular locale, and locales declare character sets. Programs have an execution character set. Text (string and character constants) compiled into the program will be represented in that execution character set. The program itself may convert text it reads from the character set of any locale to its own execution character set, and format text it generates according to the character set of any locale.

"The machine's character set at execution time" is badly worded, it implies things that don't exist or aren't true.

查看更多
smile是对你的礼貌
5楼-- · 2019-04-07 09:12

Your problem seems to lie in the fact that you're confusing Character Set of the machine with Character Encoding used.

Read this http://www.microsoft.com/typography/unicode/cs.htm to understand what character set actually means. The problem at the time of KnR (2nd Edition) was that there were just too many computers, some manufactured for the local government and public. This caused different character sets popping up between two computers, so, 'A' on a US machine was a Cyrillic character(say Foo) on a Russian machine.

Hence character constants couldn't be TRUSTED. Thanks to the modern computer manufacturers now, most character sets in the machine are the same, and information exchange is simpler.

查看更多
登录 后发表回答