My database engine writes records of 64 bytes by issuing write() syscall of the entire disk block. The device is opened with O_DIRECT mode. For example third record within a block starts at byte 128 and ends at position 192, when I do an UPDATE the entire disk block (which is by default 512 bytes) is written.
My question is, can I calim ACID compliance if I am writing the record over itself every time UPDATE occurs? Usually database engines do this in 2 steps by writing modified disk block to another (free) place and then updating an index to new block with one (atomic) write immediately after first write returned success. But I am not doing this, I am overwriting current data with new one expecting the write to be successful. Does my method has any potential problems? Is it ACID compliant? What if the hardware writes only half of the block and my record is exactly in the middle? Or does the hardware already does the 2 step write process I described , but at block level, so I don't need to repeat the same in software?
(note: no record is larger than physical disk block (512 bytes by default) and fsync goes after each write(), this is for Linux only)
ACID anticipates failures, and suggests ways to deal with them. Two-phase commits and three-phase commits are two fairly common and well-understood approaches.
Although I'm a database guy, the dbms frees me from having to think about this kind of thing very much. But I'd say overwriting a record without taking any other precautions is liable to fail the "C" and "D" properties ("consistent" and "durable").
To build really good code, imagine that your dbms server has no battery-backed cache, only one power supply, and that during a transaction there's a catastrophic failure in that one power supply. If your dbms can cope with that kind of failure fairly cleanly, I think you can call it ACID compliant.
Later . . .
I read Tweedie's transcript. He's not talking about database direct disk access; he's talking about a journaling filesystem. A journaling filesystem also does a two-phase commit.
It sounds like you're trying to reach ACID compliance (in the database sense) with a single-phase commit. I don't think you can get away with that.
Opening with O_DIRECT means "Try to minimize cache effects of the I/O to and from this file" (emphasis added). I think you'll also need O_SYNC. (But the linked kernel docs caution that most Linux filesystems don't implement POSIX semantics of O_SYNC. And both filesystems and disks have been known to lie about whether a write has hit a platter.)
There are two more cautions in the kernel docs. First, "It is recommended that applications treat use of O_DIRECT as a performance option which is disabled by default." You're not doing that. You're trying to use it to achieve ACID compliance.
Second,
SQLite has a readable paper on how they handle atomic commits. Atomic Commit in SQLite
No.
You cannot assume the disk write will be successful. And you cannot assume that the disk will leave the existing data in place. Here is some QNX documentation also stating this.
If you got really, really unlucky the disk power will fail while it is writing, leaving the block with corrupt checksums and half-written data.
This is why ACID systems use at least two copies of the data.