I have written a C++ library that saves my data (a collection of custom structs etc) into a binary file. I currently use (i.e. create and consume) the files locally, on my Windows (XP) machine. For simplicity, lets think of the library in two parts: a writer (Creates the files) and a reader or consumer (simply reads data from the files).
Recently though, I would like to also consume (i.e. read) the data files I have created on my XP machine, on my Linux machine. I must point out at this stage that both machines are PCs (so have the same endianess etc).
I can build a reader (and compile for Linux [Ubuntu 9.10 to be precise]), since I am the library creator. My question, before I embark down this road (of building the reader etc) is:
Assuming I have succesfully built the reader for Linux,
Can I simply copy accross, files that were created on the windows (XP) machine to the Linux (Ubuntu 9.10) machine and use the Linux reader to successfully read the copied over file?
For the files to be binary compatible:
- endianness must match (as it does for you)
- bitfield packing order must be the same
- sizes and signedness of types must be the same
- the compiler must make the same decisions about padding and alignment
It's certainly possible for all of these conditions to be fulfilled, or for you to not happen to be hitting any cases for which they are not. At the very least, though, I'd add some sanity checks and/or sentinel members to detect problems.
Binary files should be compatible across machines with the same endianess.
The issue you may have in your code is the size of ints, you can't necessarily assume that the compiler on different OS's has the same size int. So either copy blocks of bytes and cast them, or use int16, int32 etc.
Structs are not a file format, and you shouldn't try to use them as such.
When attempting to make structs work with fread
and fwrite
, there's a huge number of hacks to make it work. You byte-swap integers so that you can share files between little-endian and big-endian machines. You change your structs to use fixed-width integer types, so you can share between machines with different word sizes (such as between x86 and x64 machines). You add compiler-specific pragmas to control the padding of structs to share between compiler versions.
It works, but it's ugly. Not to mention, easy to get wrong.
Much like the recommendation in The byte order fallacy, a much better idea is to write code to read/write the fields individually. By writing your own code, you can ensure there's no padding, and you can choose integer sizes independently of the local size of integers, and you can support both endiannesses without byte-swapping (by reading/writing the bytes of an integer separately).
Unlike the hacky approach, this is hard to get wrong. Further, because you don't rely on any compiler or architecture specific behaviors, either your code will work on all compilers and architectures, or none. If you do it right, you shouldn't have any platform-specific bugs.
There is one downside; individually reading/writing the fields will be slower than just using fread/fwrite directly. You can set up a buffer (uint8_t buffer[]
) and write the entirety of the data into it, and then write everything out at once, which might help, but it'll still be slower (because you'd still have to move the fields into the buffer one at a time), but for most purposes it'll still be fast enough (exceptions being embedded / real-time systems or extremely high performance computing).
If:
- the machines have the same endianess (as you stated they have) and
- you do open the streams in binary mode, as text mode might do funny things e.g. with line-ends and
- you have programmed cleanly so you don't stumble over implementation-defined stuff like alignments, data type sizes, and struct packing,
then yes, your files should be portable.
The third bullet point is what makes a file format a "portable" one. Depending on what kind of data you have in your structs, it can be very easy or a bit tricky. Bitfields, or data being reinterpreted from a different type are especially tricky.
You might consider taking a look at the Boost Serialization Library.
A lot of thought has been put into it, and it will handle many of the potential cross-platform incompatibilities for you.
Of course, it's possible that it's overkill for your particular use case, especially if you've already got your writers & readers implemented.