Skip to content

Extend startup notification timeout as jobs continue to be loaded #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Extend startup notification timeout as jobs continue to be loaded #220

wants to merge 1 commit into from

Conversation

basil
Copy link
Member

@basil basil commented Mar 9, 2022

Reflection-free alternative to #207. Untested PR to consume the new API introduced in jenkinsci/jenkins#6237 to keep pushing back the startup timeout as long as Jenkins is continuing to make progress in loading jobs. This should make it far less likely that people will hit the systemd service startup timeout when using systemd with Type=notify.

@basil basil requested a review from a team as a code owner March 9, 2022 23:05
jglick
jglick previously approved these changes Mar 10, 2022
Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Do we have any practical way of testing this even manually?

K8s pods may also have a readiness timeout. (CBCI currently uses /login for this purpose.) I do not think there is any specific way for the container to signal that it is making progress still needs more time. In principle you could handle that logic in the endpoint, by having it return 200 if either Jenkins has started (HudsonIsLoading gone) or onExtendTimeout in a custom Lifecycle has been called, but returning 500 if some timeout has elapsed. Would be awkward since the custom endpoint would then need to know what the configured readiness timeout was (in case it is overridable), which Jenkins would need to read from a file projected via downwardAPI. Perhaps the effort would be better spent profiling & optimizing Item loading (deferring work to lazy initializers or background threads).

if (jobEncountered.get() > jobEncounteredAtLastTick.get()) {
Jenkins.get().getLifecycle().onExtendTimeout(TICK_INTERVAL, TimeUnit.MILLISECONDS);
}
jobEncounteredAtLastTick.set(jobEncountered.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably some idiomatic way to do the compare and set atomically but I suppose we do not care much here since it is only a heuristic.

(Are these counters actually used from concurrent threads? Predates 869ef0c and I no longer recall!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably some idiomatic way to do the compare and set atomically

If there is such an idiomatic way, I cannot think of it.

Are these counters actually used from concurrent threads?

I am not familiar with this code, but reading it raised more questions than answers in my mind (i.e., it smelled wrong to me) so I decided not to think too hard about it.

@basil
Copy link
Member Author

basil commented Mar 10, 2022

Nice! Do we have any practical way of testing this even manually?

In a previous life, I operated some Jenkins controllers with thousands of jobs backed by slow NFS storage, and some of them took over half an hour to load jobs, but I no longer have access to that environment.

K8s pods may also have a readiness timeout. (CBCI currently uses /login for this purpose.)

I did actually spend some time researching how Kubernetes deals with readiness before designing the core API. I came to the same conclusion as you: a dedicated readiness API might be nice to have, but without a backchannel to the service manager (akin to the notification socket provided by systemd) such an API would not be practical to implement.

@basil basil closed this Apr 25, 2022
@jglick
Copy link
Member

jglick commented Apr 25, 2022

Is there a specific reason for this to be closed? From what little I recall, this was useful and should still be considered.

@basil
Copy link
Member Author

basil commented Apr 25, 2022

I waited for over a month for this PR to be merged. Neither did I receive negative feedback, nor was the PR merged. I have moved on to newer priorities. I would encourage anyone who is interested in driving this forward to pick up where I left off.

@jglick
Copy link
Member

jglick commented Apr 25, 2022

Then I will take the liberty of reopening. There are plenty of valid PRs which are much older than this, and a shortage of maintainers. Closing PRs means that the information that a particular change was desirable in principle gets lost, perhaps forever.

@jglick jglick reopened this Apr 25, 2022
@basil
Copy link
Member Author

basil commented Apr 25, 2022

I have unsubscribed from notifications to this PR, as I have other priorities which demand my focus.

@jtnord
Copy link
Member

jtnord commented Nov 9, 2022

Nice! Do we have any practical way of testing this even manually?

a new Job property in https://github.com/jenkinsci/make-jenkins-slow-plugin/tree/master/src/main/java/io/jenkins/plugins/make_jenkins_slow then a few jobs in various folders all with that property I would guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants