This is a big project, actually a virtual machine of my custom design.
Under certain circumstances, program crashes with a segmentation fault every time when I run it on its own, but within GDB under those same circumstances it runs perfectly and never crashes!
I am giving it the exact same parameters and input when running inside and outside GDB.
So basically, I can't find the bug with GDB because it never has any problem when I use GDB.
The binary has been compiled with gcc -g option.
When I invoke
$ gdb ./main ./memdump
(where main is the complied program binary)
and give the bt command, I get "no stack". I read this means that the stack has been completely destroyed?
What could be causing this and how can I actually find the bug?
Edit: last few lines of instruction log
This output prints on screen, I redirected it to a file.
cmp at address 313
je at address 314
jmp at address 316
inc at address 306
div at address 307
mult at address 308
sub at address 309
cmp at address 310
ecall at ad
It crashes at a random place each time, and usually fails to finish the printf() call, as you can see here. What does this mean?
I'm sorry, I actually had the wrong core dump file.
Now I have the right one... Core backtrace shows:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000040414e in int_call_internal_f (arg=14) at
./opcode_func.c:1503
1503 if (memory[int_config[0] + memory[ip + 1]] !=
INTERRUPT_BLOCKING_VALUE)
(gdb)
This makes no sense because these are all globals and this line executes thousands of times after the values at those indices last change.
Generally, debugging C programs means that local variables (and other memory) are initialized to some well known pattern. When running in release mode, your memory will have whatever bits where there when allocated.
The other gotcha is optimizations. If you have a concurrency bug, running in the debugger will change the timing, obscuring things. Optimizations can also change the layout of things subtly so that pointer errors (offsets in particular) that blow up in release mode harmlessly overwrite unused bytes in debug mode (or vice versa).
Segmentation faults are caused by the program accessing memory that's not within its legal address space. The proximal cause of the error often has little relation to the actual cause: the actual error might store an invalid pointer, which is then dereferenced from unrelated code.
One approach, as another person commented, is to add extensive logging. However, this most often shows you the proximal cause, not the actual cause: when the logging stops you have a fairly good idea of what the program was doing at that time.
A better solution is to use a memory checker, such as Valgrind. This tool instruments your code, and will catch and log illegal memory accesses before they turn into segmentation violations. It may also gets you closer to finding the actual cause rather than the proximal cause.
As a side note: most of the illegal memory accesses that I've seen have their root in pointer-based access to an on-stack array or struct.
May be „no stack“ could mean not enough stack. Typically if large arrays/structures are not initialized on the heap (with malloc or new). You can check this in a gnu compiler environment with -fstack-usage which creates a summary main.su file on compile.
Pointers not correctly setup or array boundaries crossed are also a reason, typically in debug they may point. somewhere or a write over a boundary may not crash the program but in a release version it does. Not sure how this is done with gdb but with Microsoft you can get this by CrtDebugHeap. May be gnu toolchain has similar options / libraries.
This doesn't do what you think it does.
The second argument to GDB is interpreted as a core file, not the argument to the program.
You want: