One of the Linux kernel drivers I am developing is using network communication in the kernel (sock_create()
, sock->ops->bind()
, and so on).
The problem is there will be multiple sockets to receive data from. So I need something that will simulate a select()
or poll()
in kernel space. Since these functions use file descriptors, I cannot use the system calls unless I use the system calls to create the sockets, but that seems unnecessary since I am working in the kernel.
So I was thinking of wrapping the default sock->sk_data_ready
handler in my own handler (custom_sk_data_ready()
), which would unlock a semaphore. Then I can write my own kernel_select()
function that tries to lock the semaphore and does a blocking wait until it is open. That way the kernel function goes to sleep until the semaphore is unlocked by custom_sk_data_ready()
. Once kernel_select()
gets the lock, it unlocks and calls custom_sk_data_ready()
to relock it. So the only additional initialization is to run custom_sk_data_ready()
before binding a socket so the first call to custom_select()
does not falsely trigger.
I see one possible problem. If multiple receives occur, then multiple calls to custom_sk_data_ready()
will try unlock the semaphore. So to not lose the multiple calls and to track the sock
being used, there will have to be a table or list of pointers to the sockets being used. And custom_sk_data_ready()
will have to flag in the table/list which socket it was passed.
Is this method sound? Or should I just struggle with the user/kernel space issue when using the standard system calls?
Initial Finding:
All callback functions in the sock
structure are called in an interrupt context. This means they cannot sleep. To allow the main kernel thread to sleep on a list of ready sockets, mutexes are used, but the custom_sk_data_ready()
must act like a spinlock on the mutexes (calling mutex_trylock()
repeatedly). This also means that any dynamic allocation must use the GFP_ATOMIC
flag.
Additional possibility:
For every open socket, replace each socket's sk_data_ready()
with a custom one (custom_sk_data_ready()
) and create a worker (struct work_struct
) and work queue (struct workqueue_struct
). A common process_msg()
function will be use for each worker. Create a kernel module-level global list where each list element has a pointer to the socket and contains the worker structure. When data is ready on a socket, custom_sk_data_ready()
will execute and find the matching list element with the same socket, and then call queue_work()
with the list element's work queue and worker. Then the process_msg()
function will be called, and can either find the matching list element through the contents of the struct work_struct *
parameter (an address), or use the container_of()
macro to get the address of the list structure that holds the worker structure.
Which technique is the most sound?
Your second idea sounds more like it will work.
The CEPH code looks like it does something similar, see
net/ceph/messenger.c
.