-
Notifications
You must be signed in to change notification settings - Fork 545
Description
In further testing of Motion (Git master), I've observed crashes that occur shortly after the watchdog mechanism kills a thread. Valgrind reports an invalid read in pthread_kill()
as called from main()
in motion.c
...
if (cnt_list[i]->watchdog == WATCHDOG_KILL) {
/* if 0 then it finally did clean up (and will restart without any further action here)
* kill(, 0) == ESRCH means the thread is no longer running
* if it is no longer running with running set, then cleanup here so it can restart
*/
if(cnt_list[i]->running && pthread_kill(cnt_list[i]->thread_id, 0) == ESRCH) {
/* HERE -------------------------------------------^^^^^^^^^^^^ */
MOTION_LOG(ERR, TYPE_ALL, NO_ERRNO, "%s: cleaning Thread %d", cnt_list[i]->threadnr);
pthread_mutex_lock(&global_lock);
threads_running--;
In this code, pthread_kill()
is used to determine if the thread (which earlier received a pthread_cancel()
) is still running or not.
The problem is, this usage is incorrect. If the thread has already terminated, then the thread_id
variable effectively points to freed memory---or worse, may point to a different thread that was recently created (due to re-use of memory). Simply put, you can't use pthread_kill()
to determine if a thread is still alive. Some references:
https://stackoverflow.com/questions/1693180/how-do-i-determine-if-a-pthread-is-alive
https://stackoverflow.com/questions/2156353/how-do-you-query-a-pthread-to-see-if-it-is-still-running
By the same token, you can't use pthread_kill()
to send signals to a thread that may or may not be alive, so that nixes the other use of the function a few lines further down.
(It's arguable that the "Schrödinger's thread" usage should be supported, and I think that would be reasonable behavior. But Ulrich Drepper is dead-set against it, and unfortunately he calls the shots in the Linux/glibc world.)
A different strategy is going to be needed here. In researching the issue, I saw some mention of pthread cleanup handlers, which may be a possibility (e.g. have the handler clear the cnt_list[i]->running
flag). As for sending SIGVTALRM
, maybe use a timer and a signal handler within the thread?