-
Notifications
You must be signed in to change notification settings - Fork 130
Fix resume issues [JENKINS-49686] and [JENKINS-50199] and [JENKINS-50407] #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ome minor cleanups
…ur, such as removing a Run from active execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to come back and do a second pass on the logic around completed
, but feels good so far.
@@ -171,6 +171,9 @@ public String url() { | |||
|
|||
private transient boolean allowKill; | |||
|
|||
/** Controls whether or not our execution has been initialized via its {@link FlowExecution#onLoad(FlowExecutionOwner)} method yet if.*/ | |||
transient boolean executionLoaded = false; // NonPrivate for tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wonder if that might be something we want to expose outside of just tests - I don't have a specific scenario in mind, but I know I've stumbled across cases where this could be useful information. I think. I may be getting confused with something else, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can always expose it with a getter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…if and when there is a downstream PR demonstrating the need yada yada.
|
||
/** map from node IDs to log positions from which we should copy text */ | ||
private Map<String,Long> logsToCopy; | ||
Map<String,Long> logsToCopy; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth a // Non-private for testing
comment, but not strictly necessary since that's basically the only reason anything we do ever is package-private.
return logCopyGuard; | ||
} | ||
|
||
/** Avoids creating new instances, analogous to {@link TaskListener} but as full StreamBuildListener. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think you meant TaskListener.NULL
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Serial format of completed
should be changed.
As to the implementation, sounds OK, though the changes are too extensive for me to really review seriously (in part due to reindentation perhaps?), so I hope tests will catch any mistakes.
@@ -68,6 +68,7 @@ | |||
<workflow-support-plugin.version>2.17</workflow-support-plugin.version> | |||
<scm-api-plugin.version>2.1.1</scm-api-plugin.version> | |||
<git-plugin.version>3.2.0</git-plugin.version> | |||
<jenkins-test-harness.version>2.33</jenkins-test-harness.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just update parent POM to 3.4 or newer.
@@ -171,6 +171,9 @@ public String url() { | |||
|
|||
private transient boolean allowKill; | |||
|
|||
/** Controls whether or not our execution has been initialized via its {@link FlowExecution#onLoad(FlowExecutionOwner)} method yet if.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray if
?
@@ -171,6 +171,9 @@ public String url() { | |||
|
|||
private transient boolean allowKill; | |||
|
|||
/** Controls whether or not our execution has been initialized via its {@link FlowExecution#onLoad(FlowExecutionOwner)} method yet if.*/ | |||
transient boolean executionLoaded = false; // NonPrivate for tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…if and when there is a downstream PR demonstrating the need yada yada.
*/ | ||
private transient AtomicBoolean completed; | ||
AtomicBoolean completed; // Non-private for testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you really do not want this to be a serialized field of the build.xml
:
<completed>
<value>1</value>
</completed>
Ugh! Use boolean completed
(or Boolean
if null is a meaningful state, or an enum
) and if you need to synchronize access to it against something, I guess use logCopyGuard
.
private transient AtomicBoolean completed; | ||
AtomicBoolean completed; // Non-private for testing | ||
|
||
private transient Object logCopyGuard = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given getLogCopyGuard
, could drop the field initializer.
@@ -506,9 +535,9 @@ private void copyLogs() { | |||
|
|||
String prefix = getLogPrefix(node); | |||
if (prefix != null) { | |||
logger = new LogLinePrefixOutputFilter(listener.getLogger(), "[" + prefix + "] "); | |||
logger = new LogLinePrefixOutputFilter(getListener().getLogger(), "[" + prefix + "] "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tons of merge conflicts with #27 coming up, sigh. Was the introduction of getListener()
necessary to solve one of these bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I didn't want to introduce any more conflicts than needed but unfortunately this was the cleanest way to guard against NPEs with the listener that were seen in tests. FWIW I'll be happy to deal with the conflict resolution as part of the log rewrite integration since I'm introducing most of the conflicts -- seems only fair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, person creating branch resolves merge conflicts. I will deal with it soon.
Timer.get().submit(() -> { | ||
LOGGER.log(Level.INFO, "{0} completed: {1}", new Object[]{toString(), getResult()}); | ||
if (listener == null) { | ||
// Can we actually rely on this, now that we use an initializing getter? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that, more of a "I wonder what @jglick will say in review"
FlowExecutionListener.fireResumed(fetchedExecution); | ||
|
||
getListener().getLogger().println("Resuming build at " + new Date() + " after Jenkins restart"); | ||
Timer.get().submit(new Runnable() { // JENKINS-31614 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you are shuffling code around, might as well:
Timer.get().submit(() -> Queue.getInstance().schedule(new AfterRestartTask(this), 0)); // JENKINS-31614
}); | ||
} | ||
} | ||
} finally { // Ensure the run is ALWAYS removed from loading even if something failed, so threads awaken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/me runs away in fear
@Test public void lazyLoadExecution() { | ||
story.thenWithHardShutdown(r -> { | ||
WorkflowJob p = r.jenkins.createProject(WorkflowJob.class, "p"); | ||
p.setDefinition(new CpsFlowDefinition("echo 'dosomething'", true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to set the durability mode too? Best to not rely on defaults if it is significant here.
@jglick Does this help with Whitespace? https://github.com/jenkinsci/workflow-job-plugin/pull/93/files?w=1 |
/** null until started, or after serious failures or hard kill */ | ||
private @CheckForNull FlowExecution execution; | ||
/** Null until started, or after serious failures or hard kill. */ | ||
@CheckForNull FlowExecution execution; // Not private for test use only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I guess should be @Restricted
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, it's package-private, so it's already more restricted than if I used @Restricted
on it
static final StreamBuildListener NULL_LISTENER = new StreamBuildListener(new NullStream()); | ||
|
||
/** Used internally to ensure listener has been initialized correctly. */ | ||
StreamBuildListener getListener() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synchronized
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I believe there was a specific reason why it wasn't based on testing but... darned if I can remember the issue/exact chain of reasoning since it was rather complex.
Give me a few, I'll either synchronize or provide a detailed comment to explain the rationale why not to.
LOGGER.log(Level.WARNING, "failed to save " + this, x); | ||
} | ||
Timer.get().submit(() -> { | ||
LOGGER.log(Level.INFO, "{0} completed: {1}", new Object[]{toString(), getResult()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Level.INFO
really wise here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's what it was at before -- it should only run once per build (or at most twice in rare circumstances if something fails catastrophically right in the middle of the finalization).
} | ||
Timer.get().submit(() -> { | ||
LOGGER.log(Level.INFO, "{0} completed: {1}", new Object[]{toString(), getResult()}); | ||
if (listener == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about getListener()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the comment on synchronizing getListener()
-- give me a bit and I'll either have it changed or better documented in code comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a shallow inspection looks OK.
FlowExecution fetchedExecution = getExecution(); // Triggers execution.onLoad so we can resume running if not done | ||
|
||
if (fetchedExecution != null) { | ||
fetchedExecution.addListener(new GraphL()); | ||
executionPromise.set(fetchedExecution); | ||
|
||
if (completed == null) { | ||
completed = new AtomicBoolean(fetchedExecution.isComplete()); | ||
completed = Boolean.valueOf(fetchedExecution.isComplete()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: can just drop the valueOf
call as this is done by autoboxing anyway.
Merging now that I've done one more round of integration testing locally using the related workflow-cps PR and an updated snapshot. |
Aims to resolve (in conjunction with workflow-cps changes as well that help harden the logic):
Includes lazy-load for FlowExecutions (enhancement): JENKINS-45585 -- which if merged will replace 49086 because it will defer a ton of the load-time logic for completed Pipelines.
TODOs: