recv() is not interrupted by a signal in multithre

2020-02-08 06:59发布

问题:

I have a thread that sits in a blocking recv() loop and I want to terminate (assume this can't be changed to select() or any other asynchronous approach).

I also have a signal handler that catches SIGINT and theoretically it should make recv() return with error and errno set to EINTR.

But it doesn't, which I assume has something to do with the fact that the application is multi-threaded. There is also another thread, which is meanwhile waiting on a pthread_join() call.

What's happening here?

EDIT:

OK, now I explicitly deliver the signal to all blocking recv() threads via pthread_kill() from the main thread (which results in the same global SIGINT signal handler installed, though multiple invocations are benign). But recv() call is still not unblocked.

EDIT:

I've written a code sample that reproduces the problem.

  1. Main thread connects a socket to a misbehaving remote host that won't let the connection go.
  2. All signals blocked.
  3. Read thread thread is started.
  4. Main unblocks and installs handler for SIGINT.
  5. Read thread unblocks and installs handler for SIGUSR1.
  6. Main thread's signal handler sends a SIGUSR1 to the read thread.

Interestingly, if I replace recv() with sleep() it is interrupted just fine.

PS

Alternatively you can just open a UDP socket instead of using a server.

client

#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <errno.h>

static void
err(const char *msg)
{
    perror(msg);
    abort();
}

static void
blockall()
{
    sigset_t ss;
    sigfillset(&ss);
    if (pthread_sigmask(SIG_BLOCK, &ss, NULL))
        err("pthread_sigmask");
}

static void
unblock(int signum)
{
    sigset_t ss;
    sigemptyset(&ss);
    sigaddset(&ss, signum);
    if (pthread_sigmask(SIG_UNBLOCK, &ss, NULL))
        err("pthread_sigmask");
}

void
sigusr1(int signum)
{
    (void)signum;
    printf("%lu: SIGUSR1\n", pthread_self());
}

void*
read_thread(void *arg)
{
    int sock, r;
    char buf[100];

    unblock(SIGUSR1);
    signal(SIGUSR1, &sigusr1);
    sock = *(int*)arg;
    printf("Thread (self=%lu, sock=%d)\n", pthread_self(), sock);
    r = 1;
    while (r > 0)
    {
        r = recv(sock, buf, sizeof buf, 0);
        printf("recv=%d\n", r);
    }
    if (r < 0)
        perror("recv");
    return NULL;
}

int sock;
pthread_t t;

void
sigint(int signum)
{
    int r;
    (void)signum;
    printf("%lu: SIGINT\n", pthread_self());
    printf("Killing %lu\n", t);
    r = pthread_kill(t, SIGUSR1);
    if (r)
    {
        printf("%s\n", strerror(r));
        abort();
    }
}

int
main()
{
    pthread_attr_t attr;
    struct sockaddr_in addr;

    printf("main thread: %lu\n", pthread_self());
    memset(&addr, 0, sizeof addr);
    sock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    if (socket < 0)
        err("socket");
    addr.sin_family = AF_INET;
    addr.sin_port = htons(8888);
    if (inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr) <= 0)
        err("inet_pton");
    if (connect(sock, (struct sockaddr *)&addr, sizeof addr))
        err("connect");

    blockall();
    pthread_attr_init(&attr);
    pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
    if (pthread_create(&t, &attr, &read_thread, &sock))
        err("pthread_create");
    pthread_attr_destroy(&attr);
    unblock(SIGINT);
    signal(SIGINT, &sigint);

    if (sleep(1000))
        perror("sleep");
    if (pthread_join(t, NULL))
        err("pthread_join");
    if (close(sock))
        err("close");

    return 0;
}

server

import socket
import time

s = socket.socket(socket.AF_INET)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind(('127.0.0.1',8888))
s.listen(1)
c = []
while True:
    (conn, addr) =  s.accept()
    c.append(conn)

回答1:

Normally signals do not interrupt system calls with EINTR. Historically there were two possible signal delivery behaviors: the BSD behavior (syscalls are automatically restarted when interrupted by a signal) and the Unix System V behavior (syscalls return -1 with errno set to EINTR when interrupted by a signal). Linux (the kernel) adopted the latter, but the GNU C library developers (correctly) deemed the BSD behavior to be much more sane, and so on modern Linux systems, calling signal (which is a library function) results in the BSD behavior.

POSIX allows either behavior, so it's advisable to always use sigaction where you can choose to set the SA_RESTART flag or omit it depending on the behavior you want. See the documentation for sigaction here:

http://www.opengroup.org/onlinepubs/9699919799/functions/sigaction.html



回答2:

In a multi-threaded application, normal signals can be delivered to any thread arbitrarily. Use pthread_kill to send the signal to the specific thread of interest.



回答3:

Does signal handler invoked in same thread which waits in recv()? You may need to explicitly mask SIGINT in all other threads via pthread_sigmask()



回答4:

As alluded to in the post by <R..>, it is indeed possible to change the signal activities. I often create my own "signal" function that makes use of sigaction. Here's what I use

typedef void Sigfunc(int);

static Sigfunc* 
_signal(int signum, Sigfunc* func)
{
    struct sigaction act, oact;

    act.sa_handler = func;
    sigemptyset(&act.sa_mask);
    act.sa_flags = 0;

    if (signum != SIGALRM)
        act.sa_flags |= SA_NODEFER; //SA_RESTART;

    if (sigaction(signum, &act, &oact) < 0)
        return (SIG_ERR);
    return oact.sa_handler;
}

The attribute in question above is the 'or'ing of the sa_flags field. This is from the man page for 'sigaction': SA_RESTART provides the BSD-like behavior of allowing system calls to be restartable across signals. SA_NODEFER means allow the signal to be received from within its own signal handler.

When the signal calls are replaced with "_signal", the thread is interrupted. The output prints out "interrupted system call" and recv returned a -1 when SIGUSR1 was sent. The program stopped altogether with the same output when SIGINT was sent, but the abort was called at the end.

I did not write the server portion of the code, I just changed the socket type to "DGRAM, UDP" to allow the client to start.



回答5:

You can set a timeout on Linux recv: Linux: is there a read or recv from socket with timeout?

When you get a signal, call done on the class doing the receive.

void* signalThread( void* ptr )
{
    CapturePkts* cap=(CapturePkts*)ptr;
    sigset_t sigSet=cap->getSigSet();
    int sig=-1;
    sigwait(&sigSet,&sig); //signalThread: signal capture thread enabled;
    cout << "signal=" << sig << " caught,ending process" << endl;
    cap->setDone();
    return 0;
}

class CapturePkts
{
     CapturePkts() : _done(false) {}

     sigset_t getSigSet() { return _sigSet; }

     void setDone() {_done=true;}

     bool receive( uint8_t *buffer, int32_t bufSz, int32_t &nbytes)
     {
         bool ret=true;
         while( ! _done ) {
         nbytes = ::recv( _sockid, buffer, bufSz, 0 );
         if(nbytes < 1 ) {
            if (errno == EAGAIN || errno == EWOULDBLOCK) {
               nbytes=0; //wait for next read event
            else
               ret=false;
         }
         return ret;
     }

     private:
     sigset_t _sigSet;
     bool _done;
};