Closed
Description
Describe the bug
Sometimes the unit tests could block for 10 mins, this is not normal.
To Reproduce
Run the unit test a few more times and there's a chance you'll see the deadlock.
I read the log and the traceback, then I found the process was blocked by this:
func (e *executor) stop(standardJobsWg, singletonJobsWg, limitModeJobsWg *waitGroupWithMutex) {
...
if count < 3 {
e.done <- ErrStopJobsTimedOut
e.logger.Debug("gocron: executor stopped - timed out")
} else {
e.done <- nil // blocking
e.logger.Debug("gocron: executor stopped")
}
waiterCancel()
if e.limitMode != nil {
e.limitMode.started = false
}
}
One possible reason is that waiting for all jobs to complete takes so much time that the scheduler thinks the exit timeout (although the default timeout now is 11 seconds, it is still possible to timeout) is over and stops receiving data from “e.done”, causing the sender to block permanently.
Expected behavior
No deadlock, no timeout.
Additional context
There are two ways to fix this deadlock: increase the scheduler wait timeout or turn “e.done” into a buffered chan. IMHO, changing the timeout does not fundamentally solve the problem.