Skip to content

[BUG] - Executor deadlocks when it calls stop(). #835

Closed
@apocelipes

Description

@apocelipes

Describe the bug

Image

Sometimes the unit tests could block for 10 mins, this is not normal.

To Reproduce

Run the unit test a few more times and there's a chance you'll see the deadlock.

I read the log and the traceback, then I found the process was blocked by this:

func (e *executor) stop(standardJobsWg, singletonJobsWg, limitModeJobsWg *waitGroupWithMutex) {
	...
	if count < 3 {
		e.done <- ErrStopJobsTimedOut
		e.logger.Debug("gocron: executor stopped - timed out")
	} else {
		e.done <- nil  // blocking
		e.logger.Debug("gocron: executor stopped")
	}
	waiterCancel()

	if e.limitMode != nil {
		e.limitMode.started = false
	}
}

One possible reason is that waiting for all jobs to complete takes so much time that the scheduler thinks the exit timeout (although the default timeout now is 11 seconds, it is still possible to timeout) is over and stops receiving data from “e.done”, causing the sender to block permanently.

Expected behavior

No deadlock, no timeout.

Additional context

There are two ways to fix this deadlock: increase the scheduler wait timeout or turn “e.done” into a buffered chan. IMHO, changing the timeout does not fundamentally solve the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions