I recently read an interview with Lua co-creators Luiz H. de Figueredo and Roberto Ierusalimschy, where they discussed the design, and implementation of Lua. It was very intriguing to say the least. However, one part of the discussion brought something up in my mind. Roberto spoke of Lua as a "freestanding application" (that is, it's pure ANSI C that uses nothing from the OS.) He said, that the core of Lua was completely portable, and because of its purity has been able to be ported much more easily and to platforms never even considered (such as robots, and embedded devices.)
Now this makes me wonder. C in general is a very portable language. So, what parts of C (namely those in the the standard library) are the most unportable? and what are those that can be expected to work on most platforms? Should only a limited set of data types be used (e.g. avoiding short
and maybe float
)? What about the FILE
and the stdio
system? malloc
and free
? It seems that Lua avoids all of these. Is that taking things to the extreme? Or are they the root of portability issues? Beyond this, what other things can be done to make code extremely portable?
The reason I'm asking all of this, is because I'm currently writing an application in pure C89, and it's optimal that it be as portable as possible. I'm willing take a middle road in implementing it (portable enough, but no so much that I have to write everything from scratch.) Anyways, I just wanted to see what in general is key to writing the best C code.
As a final note, all of this discussion is related to C89 only.
C was designed so that a compiler may be written to generate code for any platform and call the language it compiles, "C". Such freedom acts in opposition to C being a language for writing code that can be used on any platform.
Anyone writing code for C must decide (either deliberately or by default) what sizes of
int
they will support; while it is possible to write C code which will work with any legal size ofint
, it requires considerable effort and the resulting code will often be far less readable than code which is designed for a particular integer size. For example, if one has a variablex
of typeuint32_t
, and one wishes to multiply it by anothery
, computing the result mod 4294967296, the statementx*=y;
will work on platforms whereint
is 32 bits or smaller, or whereint
is 65 bits or larger, but will invokeUndefined Behavior
in cases whereint
is 33 to 64 bits, and the product, if the operands were regarded as whole numbers rather than members of an algebraic ring that wraps mod 4294967296, would exceedINT_MAX
. One could make the statement work independent of the size ofint
by rewriting it asx*=1u*y;
, but doing so makes the code less clear, and accidentally omitting the1u*
from one of the multiplications could be disastrous.Under the present rules, C is reasonably portable if code is only used on machines whose integer size matches expectations. On machines where the size of
int
does not match expectations, code is not likely to be portable unless it includes enough type coercions to render most of the language's typing rules irrelevant.This is a very broad question. I'm not going to give the definite answer, instead I'll raise some issues.
Note that the C standard specifies certain things as "implementation-defined"; a conforming program will always compile on and run on any conforming platform, but it may behave differently depending on the platform. Specifically, there's
sizeof(long)
may be four bytes on one platform, eight on another. The sizes ofshort
,int
,long
etc. each have some minimum (often relative to each other), but otherwise there are no guarantees.int a = 0xff00; int b = ((char *)&a)[0];
may assign0
tob
on one platform,-1
on another.\0
is always the null byte, but how the other characters show up depends on the OS and other factors.putchar('\n')
may produce a line-feed character on one platform, a carriage return on the next, and a combination of each on yet another.char
to take on negative values.Various word sizes and endiannesses are common. Character encoding issues are likely to come up in any text-processing application. Machines with 9-bit bytes are most likely to be found in museums. This is by no means an exhaustive list.
(And please don't write C89, that's an outdated standard. C99 added some pretty useful stuff for portability, such as the fixed-width integers
int32_t
etc.)C89 allows two types of compilers: hosted and freestanding. The basic difference is that a hosted compiler provides all of the C89 library, while a freestanding compiler need only provide
<float.h>
,<limits.h>
,<stdarg.h>
, and<stddef.h>
. If you limit yourself to these headers, your code will be portable to any C89 compiler."Freestanding" has a particular meaning in the context of C. Roughly, freestanding hosts are not required to provide any of the standard libraries, including the library functions
malloc
/free
,printf
, etc. Certain standard headers are still required, but they only define types and macros (for examplestddef.h
).In the case of Lua, we don't have much to complain about the C language itself but we have found that the C standard library contains many functions that seem harmless and straight-forward to use, until you consider that they do not check their input for validity (which is fine if inconveninent). The C standard says that handling bad input is undefined behavior, allowing those functions to do whatever they want, even crash the host program. Consider, for instance, strftime. Some libc's simply ignore invalid format specifiers but other libc's (e.g., in Windows) crash! Now, strftime is not a crucial function. Why crash instead of doing something sensible? So, Lua has to do its own validation of input before calling strftime and exporting strftime to Lua programs becomes a chore. Hence, we have tried to stay clear from these problems in the Lua core by aiming at freestanding for the core. But the Lua standard libraries cannot do that, because their goal is to export facilities to Lua programs, including what is available in the C standard library.
Anything that is a part of the C89 standard should be portable to any compiler that conforms to that standard. If you stick to pure C89, you should be able to port it fairly easily. Any portability problems would then be due to compiler bugs or places where the code invokes implementation-specific behavior.