I'm designing a binary file format to store strings[without terminating null to save space] and binary data.
i. What is the best way to deal with little/big endian systems? i.a Would converting everything to network byte order and back with ntohl()/htonl() work?
ii. Will the packed structures be the same size on x86, x64 and arm?
iii. Are their any inherent weakness with this approach?
struct __attribute__((packed)) Header {
uint8_t magic;
uint8_t flags;
};
struct __attribute__((packed)) Record {
uint64_t length;
uint32_t crc;
uint16_t year;
uint8_t day;
uint8_t month;
uint8_t hour;
uint8_t minute;
uint8_t second;
uint8_t type;
};
Tester code I'm using the develop the format:
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <strings.h>
#include <stdint.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
struct __attribute__((packed)) Header {
uint8_t magic;
uint8_t flags;
};
struct __attribute__((packed)) Record {
uint64_t length;
uint32_t crc;
uint16_t year;
uint8_t day;
uint8_t month;
uint8_t hour;
uint8_t minute;
uint8_t second;
uint8_t type;
};
int main(void)
{
int fd = open("test.dat", O_RDWR|O_APPEND|O_CREAT, 444);
struct Header header = {1, 0};
write(fd, &header, sizeof(header));
char msg[] = {"BINARY"};
struct Record record = {strlen(msg), 0, 0, 0, 0, 0, 0, 0};
write(fd, &record, sizeof(record));
write(fd, msg, record.length);
close(fd);
fd = open("test.dat", O_RDWR|O_APPEND|O_CREAT, 444);
read(fd, &header, sizeof(struct Header));
read(fd, &record, sizeof(struct Record));
int len = record.length;
char c;
while (len != 0) {
read(fd, &c, 1);
len--;
printf("%c", c);
}
close(fd);
}
i. Defining the file to be in one order and converting to and from "internal" order, if necessary, when reading/writing (perhaps with ntohl and the like) is, in my opinion, the best approach.
ii. I do not trust packed structures. They might work for this approach for those platforms, but there are no guarantees.
iii. Reading and writing binary files using fread and fwrite on whole structs is (again in my opinion) an inherently weak approach. You maximize the likelihood that you will be bitten by word size problems, padding and alignment problems, and byte order problems.
What I like to do is write little functions like get16() and put32() that read and write a byte at a time and so are inherently insensitive to word size and byte order difficulties. Then I write straightforward putHeader and getRecord functions (and the like) in terms of these.
[P.S. As @Olaf correctly points out in one of the comments, in production code you'd need handling for EOF and error in these functions. I've left those out for simplicity of presentation.]