Skip to content

Incorrect usage of pthread_kill() leads to crashes #366

@iskunk

Description

@iskunk

In further testing of Motion (Git master), I've observed crashes that occur shortly after the watchdog mechanism kills a thread. Valgrind reports an invalid read in pthread_kill() as called from main() in motion.c...

                    if (cnt_list[i]->watchdog == WATCHDOG_KILL) {
                        /* if 0 then it finally did clean up (and will restart without any further action here)
                         * kill(, 0) == ESRCH means the thread is no longer running
                         * if it is no longer running with running set, then cleanup here so it can restart
                         */
                        if(cnt_list[i]->running && pthread_kill(cnt_list[i]->thread_id, 0) == ESRCH) {
/* HERE -------------------------------------------^^^^^^^^^^^^ */
                            MOTION_LOG(ERR, TYPE_ALL, NO_ERRNO, "%s: cleaning Thread %d", cnt_list[i]->threadnr);
                            pthread_mutex_lock(&global_lock);
                            threads_running--;

In this code, pthread_kill() is used to determine if the thread (which earlier received a pthread_cancel()) is still running or not.

The problem is, this usage is incorrect. If the thread has already terminated, then the thread_id variable effectively points to freed memory---or worse, may point to a different thread that was recently created (due to re-use of memory). Simply put, you can't use pthread_kill() to determine if a thread is still alive. Some references:

https://stackoverflow.com/questions/1693180/how-do-i-determine-if-a-pthread-is-alive

https://stackoverflow.com/questions/2156353/how-do-you-query-a-pthread-to-see-if-it-is-still-running

By the same token, you can't use pthread_kill() to send signals to a thread that may or may not be alive, so that nixes the other use of the function a few lines further down.

(It's arguable that the "Schrödinger's thread" usage should be supported, and I think that would be reasonable behavior. But Ulrich Drepper is dead-set against it, and unfortunately he calls the shots in the Linux/glibc world.)

A different strategy is going to be needed here. In researching the issue, I saw some mention of pthread cleanup handlers, which may be a possibility (e.g. have the handler clear the cnt_list[i]->running flag). As for sending SIGVTALRM, maybe use a timer and a signal handler within the thread?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions