“Unable to handle kernel NULL pointer dereference

2019-08-10 11:53发布

I was learning some basics of kernel modules and threads. And so i tried to make a example module and test it. Now, it loads successfully.

Module code:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/version.h>


static struct task_struct *thread_st;

// Function called by thread
static int thread_fun(void *unused)
{
    allow_signal(SIGKILL);
    while(!kthread_should_stop())
    {
        printk(KERN_INFO "Thread Running\n");
        ssleep(5);

        if(signal_pending(current))
            break;
    }
    printk(KERN_INFO "Thread Stopping\n");
    do_exit(0);
    return 0;
}



// Module initialisation
static int __init init_thread(void)
{
    printk(KERN_INFO "Creating Thread\n");

    thread_st = kthread_run(thread_fun, NULL, "mythread");
    if(thread_st)
        printk(KERN_INFO "Thread created successfully\n");
    else
        printk(KERN_INFO "Thread creation failed\n");
    return 0;

}




// Module exit
static void __exit cleanup_thread(void)
{
    printk(KERN_INFO "Cleaning up\n");
    if(thread_st)
    {
        kthread_stop(current);
        printk(KERN_INFO "Thread Stopped\n");
    }
}

module_init(init_thread);
module_exit(cleanup_thread);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pinkesh Badjatiya");
MODULE_DESCRIPTION("Simple Kernel Module");

Now, once the module is loaded the procedure i follow to unload it is,

  1. Send a SIGKILL signal, sudo kill -9 [PID]
  2. Wait for the dmesg to show 'Thread Stopping', which simply means that the kthread_should_stop() has returned true.
  3. Remove the module, sudo rmmod [MODULE_NAME]

dmesg log:

[  492.979030] Creating Thread
[  492.979753] Thread created successfully
[  492.979776] Thread Running
[  497.985420] Thread Running
[  502.992223] Thread Running
[  507.999007] Thread Running
[  513.005837] Thread Running
[  518.012585] Thread Running
[  523.019354] Thread Running
[  528.026195] Thread Running
[  533.032919] Thread Running
[  538.039795] Thread Running
[  543.046588] Thread Running
[  548.053383] Thread Stopping
[  556.317200] Cleaning up
[  556.317212] Thread Stopped

Now when i change the variable current with the original used struct pointer thread_st and then load the module and follow the same procedure as above to remove the module, the kernel gives a panic(OOPS) and fills up the dmesg log.

I also get a Report Error popup on Ubuntu.

dmesg log:

[ 1269.832922] Creating Thread
[ 1269.833888] Thread created successfully
[ 1269.834217] Thread Running
[ 1274.839425] Thread Running
[ 1279.846211] Thread Running
[ 1284.853017] Thread Running
[ 1289.859819] Thread Running
[ 1294.866589] Thread Running
[ 1299.873353] Thread Stopping
[ 1305.758783] Cleaning up
[ 1305.758853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1305.762603] IP: [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.766266] PGD 0 
[ 1305.769967] Oops: 0000 [#3] SMP 
[ 1305.774675] Modules linked in: kernel_thread_example(OE-) vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) cmac rmd160 crypto_null camellia_generic camellia_x86_64 cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common deflate cts ctr gcm ccm serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_avx_x86_64 twofish_x86_64_3way xts twofish_x86_64 twofish_common xcbc sha256_ssse3 sha512_ssse3 des_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp af_key xfrm_algo bnep rfcomm bluetooth 6lowpan_iphc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi arc4 snd_seq intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ath9k ath9k_common ath9k_hw crct10dif_pclmul snd_seq_device crc32_pclmul snd_timer ath ghash_clmulni_intel cryptd mac80211 joydev serio_raw snd cfg80211 i915 lpc_ich shpchp soundcore drm_kms_helper drm mei_me mei i2c_algo_bit mac_hid video wmi parport_pc ppdev lp parport hid_generic usbhid hid psmouse ahci libahci atl1c [last unloaded: kernel_thread_example]
[ 1305.817666] CPU: 3 PID: 4038 Comm: rmmod Tainted: G      D    OE 3.16.0-50-generic #66~14.04.1-Ubuntu
[ 1305.822078] Hardware name: HCL Infosystems Limited HCL ME LAPTOP/HCL Infosystems Limited, BIOS 203.T01 03/19/2011
[ 1305.826447] task: ffff8800a6221e90 ti: ffff880119700000 task.ti: ffff880119700000
[ 1305.830740] RIP: 0010:[<ffffffff81096d6b>]  [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.834968] RSP: 0018:ffff880119703e90  EFLAGS: 00010246
[ 1305.839081] RAX: 0000000000000000 RBX: ffff8800b6e065e0 RCX: 0000000000000000
[ 1305.843133] RDX: ffffffff81c8ea00 RSI: ffff8800b6e065e0 RDI: 0000000000000000
[ 1305.847062] RBP: ffff880119703e98 R08: 0000000000000086 R09: 0000000000000431
[ 1305.850897] R10: 0000000000000000 R11: ffff880119703c0e R12: ffff8800b6e065e0
[ 1305.854697] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f0325bb6240
[ 1305.858456] FS:  00007f0325595740(0000) GS:ffff88011fa60000(0000) knlGS:0000000000000000
[ 1305.862225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1305.866197] CR2: 0000000000000000 CR3: 00000000b6e23000 CR4: 00000000000407e0
[ 1305.866199] Stack:
[ 1305.866206]  ffff8800b6e065e0 ffff880119703eb8 ffffffff8106abf2 0000000000000000
[ 1305.866211]  ffff8800b6e065e0 ffff880119703ee0 ffffffff81091868 0000000000000000
[ 1305.866216]  ffffffffc0a61000 0000000000000800 ffff880119703ef0 ffffffffc0a5f086
[ 1305.866217] Call Trace:
[ 1305.866232]  [<ffffffff8106abf2>] __put_task_struct+0x52/0x140
[ 1305.866241]  [<ffffffff81091868>] kthread_stop+0xd8/0xe0
[ 1305.866249]  [<ffffffffc0a5f086>] cleanup_thread+0x23/0xf9d [kernel_thread_example]
[ 1305.866259]  [<ffffffff810ebbb2>] SyS_delete_module+0x162/0x200
[ 1305.866268]  [<ffffffff8176edcd>] system_call_fastpath+0x1a/0x1f
[ 1305.866318] Code: ff ff 85 c0 0f 84 33 fe ff ff e9 0c fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 48 8b 87 c0 05 00 00 48 89 fb 48 8b bf b8 05 00 00 <8b> 00 48 c7 83 b8 05 00 00 00 00 00 00 f0 ff 0f 74 23 48 8b bb 
[ 1305.866324] RIP  [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.866326]  RSP <ffff880119703e90>
[ 1305.866328] CR2: 0000000000000000
[ 1305.866378] ---[ end trace 0bd516c6629996c7 ]---

I am not able to figure why is this happening.
I searched on internet but could not find any reason.

Also, Is the variable current already declared in any of the above headers and what is the problem with using thread_st which i have created above?

2条回答
女痞
2楼-- · 2019-08-10 12:06

From the description of kthread_stop function:

If threadfn() may call do_exit() itself, the caller must ensure task_struct can't go away.

This means that you cannot simply exit from kthread if it is terminated by kthread_stop() elsewhere. You should either exit only when found kthread_should_stop() being true, or should grub reference to task_struct (in some way) before exit.

Wait for the dmesg to show 'Thread Stopping', which simply means that the kthread_should_stop() has returned true.

In case of signal_pending(current), this would be true without allow_signal() calls. kthread_should_stop() is true only when someone call kthread_stop() for given thread. In case of signals, explicitely sent by user space(because of allow_signal()), signal_pending(current) doesn't reflect kthread_should_stop() state.

So, both your implementations are incorrect, because they exit thread in case of signal explicitely sent from use space.

Additionally, using thread_st in the kthread function introduces a race condition: thread function may start before kthread_run() returns (and its result be assigned to thread_st).

Update:

You may wait until kthreas_stop() will be called just after "Thread Stopping":

static int thread_fun(void *unused)
{
    allow_signal(SIGKILL);
    while(!kthread_should_stop())
    {
        printk(KERN_INFO "Thread Running\n");
        ssleep(5);

        if(signal_pending(current))
            break;
    }
    printk(KERN_INFO "Thread Stopping\n");

    // Wait until kthread will be actually stopped.
    while(!kthread_should_stop())
    {
        /* 
         * Flush any pending signal.
         *
         * Otherwise interruptible wait will not wait actually.
         */
        flush_signals(current);
        /* Stopping thread is some sort of interrupt. That's why we need interruptible wait. */        
        set_current_state(TASK_INTERRUPTIBLE);
        if(!kthread_should_stop()) schedule();
        set_current_state(TASK_RUNNING);
    }

    return 0;
}
查看更多
倾城 Initia
3楼-- · 2019-08-10 12:15
  1. current always points to currently running task and is included via some kernel headers. So we need to use it carefully. And hence in below written function you are trying to stop the task that called cleanup_thread() i.e rmmod process as cleanup_thread() is a module exit function

    static void __exit cleanup_thread(void)
    {
        printk(KERN_INFO "Cleaning up\n");
        if(thread_st)
        {
            kthread_stop(current);
            printk(KERN_INFO "Thread Stopped\n");
        }
    }
    
  2. Probable cause of the issue is first you are killing the thread with kill -9. This causes the thread to die and task_struct gets freed. But since thread_st is not made to zero, it is a dangling pointer i.e it is pointing to already freed location.

Then in cleanup_exit() if you call kthread_stop(thread_st), then actually you are passing invalid memory location and hence kernel is crashing.

Try nullifying thread_st before you do do_exit()

查看更多
登录 后发表回答