As far as I know, the C library provides no help in serializing numeric values into a non-text byte stream. Correct me if I'm wrong.
The most standard tool in use is htonl
et al from POSIX. These functions have shortcomings:
- There is no 64-bit support.
- There is no floating-point support.
- There are no versions for signed types. When deserializing, the unsigned-to-signed conversion relies on signed integral overflow which is UB.
- Their names do not state the size of the datatype.
- They depend on 8-bit bytes and the presence of exact-size uint_N_t.
- The input types are the same as the output types, instead of referring to a byte stream.
- This requires the user to perform a pointer typecast which is possibly unsafe in alignment.
- Having performed that typecast, the user is likely to attempt to convert and output a structure in its native memory layout, a poor practice which results in unexpected errors.
An interface for serializing arbitrary-size char
to 8-bit standard bytes would fall in between the C standard, which doesn't really acknowledge 8-bit bytes, and whatever standards (ITU?) set the octet as the fundamental unit of transmission. But the older standards aren't getting revised.
Now that C11 has many optional components, a binary serialization extension could be added alongside things like threads without placing demands on existing implementations.
Would such an extension be useful, or is worrying about non-two's-complement machines just that pointless?
See xdr library and XDR standards RFC-1014 RFC-4506
In my opinion the main drawback of functions like
htonl()
is that they do only half the work what is serialization. They only flip the bytes in a multi-byte integer if you machine is little endian. The other important thing that must be done when serializing is handling alignment, and these functions don't do that.A lot of CPUs are not capable of (efficiently) accessing multi-byte integers which aren't stored at an memory location which address isn't a multiple of the size of the integer in bytes. This is the reason to never ever use struct overlays to (de)serialize network packets. I'm not sure if this is what you mean by 'in-place conversion'.
I work a lot with embedded systems, and I've functions in my own library which I always use when generating or parsing network packets (or any other I/O: disk, RS232, etc):
Along with these functions there are a bunch of macros defined suchs as:
The (de)serialize functions read or write the values byte by byte, so alignment problems will not occur. You don't need to worry about signedness either. In the first place all systems these days use 2s complement (besides a few ADCs maybe, but then you wouldn't use these functions). However it should even work on a system using 1s complement because (as far as I know) a signed integer is converted to 2s complement when casted to unsigned (and the functions accept/return unsigned integers).
Another argument of you is they depend on 8-bit bytes and the presence of exact-size
uint_N_t
. This also counts for my functions, but in my opinion this is not a problem (those types are always defined for the systems and their compilers I work with). You could tweak the function prototypes to useunsigned char
instead ofuint8_t
and something likelong long
oruint_least64_t
instead ofuint64_t
if you like.I've never used them, but I think Google's Protocol Buffers satisfy your requirements.
This tutorial seems like a pretty good introduction, and you can read about the actual binary storage format here.
From their web page:
There's no official implementation in pure C (only C++), but there are two C ports that might fit your needs:
Nanopb, at http://koti.kapsi.fi/jpa/nanopb/
Protobuf-c at http://code.google.com/p/protobuf-c/
I don't know how they fare in the presence of non-8 bit bytes, but it should be relatively easy to find out.