What do these Linux Kernel Oops fields mean?

2019-07-17 06:15发布

问题:

I have already encountered some Oops in my developer's life and whereas I am familiar with some information that I can retrieve from these Oops, there are still pieces of information I can't understand and therefore, can't use to solve problems.

Below you will find an Oops example and I will describe what I can deduce from it. Then, I will ask what the remaining info can teach me about the problem.

[  716.485951] BUG: unable to handle kernel paging request at fc132158
[  716.485973] IP: [<fc1936e7>] ubi_change_vtbl_record+0x87/0x1c0 [ubi]
[  716.485986] *pdpt = 00000000019e6001 *pde = 000000002c558067 *pte = 0000000000000000 
[  716.485997] Oops: 0002 [#1] SMP 
[  716.486004] Modules linked in: ubi(O) mtdchar nandsim nand mtd nand_ids nand_bch bch nand_ecc bnep rfcomm bluetooth parport_pc ppdev lp parport nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_analog kvm_intel snd_hda_intel snd_hda_codec snd_hwdep kvm snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event hid_generic snd_seq cdc_acm snd_timer snd_seq_device mei tpm_tis snd mac_hid serio_raw soundcore lpc_ich snd_page_alloc microcode coretemp usbhid hid nouveau usb_storage ttm drm_kms_helper drm floppy e1000e i2c_algo_bit mxm_wmi video wmi
[  716.486128] Pid: 3994, comm: ubimkvol Tainted: G           O 3.8.0-rc3+ #3 LENOVO 6239AS8/LENOVO
[  716.486136] EIP: 0060:[<fc1936e7>] EFLAGS: 00010246 CPU: 0
[  716.486144] EIP is at ubi_change_vtbl_record+0x87/0x1c0 [ubi]
[  716.486151] EAX: 000000ac EBX: eb5ea000 ECX: 0000002b EDX: 00000000
[  716.486157] ESI: eb4d1d74 EDI: fc132158 EBP: eb4d1d40 ESP: eb4d1d20
[  716.486164]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  716.486170] CR0: 8005003b CR2: fc132158 CR3: 27542000 CR4: 000407f0
[  716.486176] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  716.486183] DR6: ffff0ff0 DR7: 00000400
[  716.486188] Process ubimkvol (pid: 3994, ti=eb4d0000 task=ec01d9b0 task.ti=eb4d0000)
[  716.486195] Stack:
[  716.486199]  e755f000 eb4d1d2c c11cad11 eb4d1d34 eb543c00 eb5ea000 00000000 eb4d1e20
[  716.486215]  eb4d1e30 fc195412 e755f000 fc1adf01 eb5ea26c 00000002 0000009e eb5ea480
[  716.486232]  00000002 e755f22c e755f2ac e755f000 eb4d1d74 2a000000 01000000 00000000
[  716.486248] Call Trace:
[  716.486257]  [<c11cad11>] ? sysfs_create_file+0x21/0x30
[  716.486266]  [<fc195412>] ubi_create_volume+0x4b2/0x790 [ubi]
[  716.486277]  [<fc19967a>] ubi_cdev_ioctl+0x5da/0xac0 [ubi]
[  716.486285]  [<c117202a>] ? link_path_walk+0x5a/0x7d0
[  716.486294]  [<fc1990a0>] ? vol_cdev_ioctl+0x440/0x440 [ubi]
[  716.486842]  [<c1177e12>] do_vfs_ioctl+0x82/0x5b0
[  716.487703]  [<c1171ced>] ? final_putname+0x1d/0x40
[  716.488564]  [<c1171ced>] ? final_putname+0x1d/0x40
[  716.489422]  [<c1171ced>] ? final_putname+0x1d/0x40
[  716.489891]  [<c1171eb4>] ? putname+0x24/0x40
[  716.489891]  [<c1167239>] ? do_sys_open+0x169/0x1d0
[  716.489891]  [<c11783b0>] sys_ioctl+0x70/0x80
[  716.489891]  [<c16205cd>] sysenter_do_call+0x12/0x38
[  716.489891] Code: ac 00 00 00 03 bb c8 04 00 00 f7 c7 01 00 00 00 0f 85 ee 00 00 00 f7 c7 02 00 00 00 0f 85 ca 00 00 00 89 c1 31 d2 c1 e9 02 a8 02 <f3> a5 74 0b 0f b7 16 66 89 17 ba 02 00 00 00 a8 01 74 07 0f b6
[  716.489891] EIP: [<fc1936e7>] ubi_change_vtbl_record+0x87/0x1c0 [ubi] SS:ESP 0068:eb4d1d20
[  716.489891] CR2: 00000000fc132158
[  716.516453] ---[ end trace 473b15a7780e19ea ]---

It seems that the kernel wanted to access a wrong page. Now,

  • The Oops code 0002 tells me that it occurred while trying to read something in user-mode.
  • The Instruction Pointer is at ubi_change_vtbl_record, which means the offending instruction is located in this function.
  • I can deduce the path that lead to the faulting function from the call trace (an ioctl launched from process ubimkvol)

From there, Is the "stack" a dump of the raw stack of the task ? I can see that some values mentioned are also function addresses found in the call trace. Then, I got fancy looking values like EAX, EBX ... DR7. I think they are CPU registers but still, I don't know what they really are.

Finally, the following line gets me lost :

[  716.485986] *pdpt = 00000000019e6001 *pde = 000000002c558067 *pte = 0000000000000000

What are pdpt, pde and pte ? I feel they are information about the page fault but I could not retrieve further information after some googling around.

回答1:

Yes, EAX, etc. are 32-bit x86 processor registers. pdpt (page directory pointer table), pde (page directory entry), and pte (page table entry) are all paging structures.

IP (also EIP for 32-bit or RIP for 64-bit processors) is the instruction pointer at the time of the Oops.

The stack is the raw stack for this processor. Each processor will have its own stack. Note that on this architecture the stack grows down (addresses start with 0xfxxxxxx).



回答2:

Correct me if I am wrong but, OOPS 0002 means no page found when writing in kernel mode:

bit 0 == 0 means no page found, 1 means a protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode