Life is full of interesting puzzles, wrestling with them makes me cackle ...
Recently I get 1 interesting segment fault core dump from running instance x86-64 linux in vm (vmware).
mov 0x18(%rdi) %rax // move a pointer to %rax, trick things happens here
it seems rax did not get expected value at all
...// 2 instructions later
mov %r8,0x10(%rax) // load some value to offset of the pointer in memory
Details below.
Segment fault
Dump of assembler code for function timer_delink:
// Function: boolean timer_delink(timer_t *timer), where timer is a cycle link list(prev/next never NULL)
0x42e0f0 <+0>: mov (%rdi),%rcx rdi <= timer; rcx <= timer->parent
0x42e0f3 <+3>: xor %eax,%eax eax <= update_parent <= 0; eax stores return value
0x42e0f5 <+5>: test %rcx,%rcx if (!timer->parent) return(FALSE);
0x42e0f8 <+8>: je 0x42e138 <timer_delink+72> return eax(update_parent);
0x42e0fa <+10>: mov 0x18(%rdi),%rax rax <= timer->prev //rax should contain timer->prev, which is
0x42e0fe <+14>: mov 0x10(%rdi),%r8 r8 <= timer->next
0x42e102 <+18>: mov 0x8(%rcx),%rdx rdx <= timer->parent->down
=>0x42e106 <+22>: mov %r8,0x10(%rax) timer->rev->next = timer->next;//info register said rax = 0;
0x42e10a <+26>: mov 0x10(%rdi),%rsi rsi <= timer->next
0x42e10e <+30>: mov %rax,0x18(%rsi) timer->next->prev = timer->prev;
0x42e112 <+34>: xor %eax,%eax eax <= update_parent <= 0
In offending instruction (0x42e106) tries to mov %r8's content to offset 16 from the address contained in %rax, which caused segment fault
Info register said rax = 0, no wonder why segment fault :), But .....
(gdb) info register
rax 0x0 0
..
rdi 0x20103ff0 ==> stores timer pointer
But per instruction 0x42e0fa, rax should contain timer->prev, which is not 0 per memory dump below
(gdb) p *timer
$8 = {parent = 0x2f379e0 <root_timer>, down = 0x0, next = 0x201027c0, prev = 0x20103b28 ...}
So the puzzle is, how could the content of %rax differs from memory on the 3rd instruction after the mov instruction(0x42e0fa)
Could it be cache issue? Could it be race condition?
The context of this function call is happen in a ukernel on top of linux and segment fault happens when ukernel is rescheduling the threads. Only one hardware CPU thread available.