Customer reported an error in one of our programs caused by division by zero.
We have only this VLM line:
kernel: myprog[16122] trap divide error rip:79dd99 rsp:2b6d2ea40450 error:0
I do not believe there is core file for that.
I searched through the Internet to find how I can tell the line of the program that caused this division by zero, but so far I am failing.
I understand that 16122 is pid of the program, so that will not help me.
I suspect that rsp:2b6d2ea40450 has something to do with the address of the line that caused the error (0x2b6d2ea40450) but is that true?
If it is then how can I translate it to a physical approximate location in the source assuming I can load debug version of myprog into gdb, and then request to show the context around this address...
Any, any help will be greatly appreciated!
rip is the instruction pointer, rsp is the stack pointer. The stack pointer is not too useful unless you have a core image or a running process.
You can use either addr2line
or the disassemble
command in gdb
to see the line that got the error, based on the ip.
$ cat divtest.c
main()
{
int a, b;
a = 1; b = a/0;
}
$ ./divtest
Floating point exception (core dumped)
$ dmesg|tail -1
[ 6827.463256] traps: divtest[3255] trap divide error ip:400504 sp:7fff54e81330
error:0 in divtest[400000+1000]
$ addr2line -e divtest 400504
./divtest.c:5
$ gdb divtest
(gdb) disass /m 0x400504
Dump of assembler code for function main:
2 {
0x00000000004004f0 : push %rbp
0x00000000004004f1 : mov %rsp,%rbp
3 int a, b;
4
5 a = 1; b = a/0;
0x00000000004004f4 : movl $0x1,-0x4(%rbp)
0x00000000004004fb : mov -0x4(%rbp),%eax
0x00000000004004fe : mov $0x0,%ecx
0x0000000000400503 : cltd
0x0000000000400504 : idiv %ecx
0x0000000000400506 : mov %eax,-0x8(%rbp)
6 }
0x0000000000400509 : pop %rbp
0x000000000040050a : retq
End of assembler dump.