-
Notifications
You must be signed in to change notification settings - Fork 370
[JENKINS-43194] Use merge_commit_sha for master #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FTR: goes back to an approach abandoned in #60. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you are missing is the basic promise of PullRequestSCMRevision
: that it includes the minimal state needed to represent a particular, deterministic snapshot of the code to be checked out. For a merge head, that would include two commit SHAs: one taken from the PR at the time of indexing or event (not necessarily the current head of the PR!); and one taken from the base branch—again not necessarily the current head, and possibly not even a recent one (see discussion in JENKINS-37491).
The plugin deliberately implements its own merging (in JGit) of the exact two selected revisions, which in the general case requires a clone and local files. The GitHub API only offers some semi-documented convenience APIs (IIRC dropped from v4 and at best tentative in v3) to obtain an otherwise unadvertised merge of some recent revisions of the PR branch with the base branch.
If they had a first-class API whereby you could call methods like getFileContent
and pass two SHAs, with the service performing a transient merge or the logical equivalent, then that would be perfect. But they do not. My suggestion in JIRA (not prototyped!) is to use something like the /repos/<owner>/<repo>/compare/<branch>...<hash>
API to find the common ancestor of the base and pull commits; then check whether there are three distinct file contents in those three commits. If so, abort; otherwise you can pick one as the effective merge contents. This would preserve the existing behavior (to a reasonable degree of confidence), minus the heavyweight checkout.
@@ -981,19 +982,22 @@ public GHPermissionType fetch(String username) throws IOException, InterruptedEx | |||
public SCMRevision create(@NonNull PullRequestSCMHead head, | |||
@Nullable Void ignored) | |||
throws IOException, InterruptedException { | |||
String baseHash = pr.getBase().getSha(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessarily a meaningful value, as noted here. It is in particular not necessarily the current head commit of the base branch (what is being calculated in the deleted code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a change from the existing behavior - if you read the code, all I did was move the assignments/return statements around. If you like, I can put them back so they don't show up in the diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all I did was move the assignments/return statements around
Unclear to me. The original code called GhRepository.getRef
on the base branch, which would retrieve its current head. The new code is extracting .base.sha
from the PR API response object, which can be a completely different value. (IIRC it is typically, but not always, the last common ancestor between the two branches.)
I can put them back so they don't show up in the diff.
If some part of a patch can be reverted so that the real changes can be seen to have more localized effects than of course this is preferable.
mergeRef.getObject().getSha(), | ||
pr.getHead().getSha()); | ||
baseHash = ghRepository.getRef("heads/" + head.getTarget().getName()).getObject().getSha(); | ||
mergeHash = pr.getMergeCommitSha(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is going to produce a synthetic commit which may or may not have as parents the baseHash
and the .head.sha
. Depends on exactly when you call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not sure what you're pointing out here or why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the new behavior will be nondeterministic, and the PullRequestSCMRevision
may wind up with three fields that are not in fact consistent.
assert !pr.isMerge(); // TODO see below | ||
this.ref = ((PullRequestSCMRevision) rev).getPullHash(); | ||
if (pr.isMerge()) { | ||
this.ref = ((PullRequestSCMRevision) rev).getMergeHash(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will in general produce different results from what GitHubSCMBuilder.build
does when you actually checkout scm
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it? Git merge is in one place isn't git merge in another?
Could you explain or point me to documentation that will help me understand your point?
The merge_commit_sha
is guaranteed to be be a valid automatic merge.
How significant can the differences be?
It will be a different commit sha, but that is already true for each agent.
I understand that what you are saying is factually true, but to my knowledge the difference is theoretical rather than practical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git merge is in one place isn't git merge in another?
The resulting trees would be identical if
- both merges started with the same parent commits
- both merges used the same algorithm and other options
The second point is perhaps only theoretical. The first point seems to be a real possibility.
How significant can the differences be?
Who knows? Its behavior is undocumented.
At a minimum, you would have to replace the existing behavior of GitHubSCMBuilder.build
to ensure that it was using the same merge commit rather than doing its own local merge.
* | ||
* @return The commit hash of the head of the pull request branch | ||
*/ | ||
@CheckForNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be null if you load an old serialized object, and yet the one caller is not checking for null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is once again, POC. The scenario for the caller is guaranteed to never have null.
In final version, this will need to be safely migrated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was just mentioning a potential problem while I was here, in case you had not noticed it yet.
I don't think it does. At minimum, this actually uses the
I'm not getting your point. Once the PullRequestSCMRevision is created, it is just as deterministic as it was before this change. When it is created, the Base and Pull hashes (and if applicable the Merge Hash) are set and aren't changed. I'm not getting where the problem is.
I don't know what APIs you're referring to, but I'm not sure what that have to do with this change.
That could work, too. We might need to use it for the bitbucket plugin. But in this plugin it would be a much larger change than querying and using I'm trying to make sense of your points here. I get that you're saying there are some cases where this change might not work in a provably correct and deterministic way, but I can't make sense when you think those cases will occur. You're not saying in which customer scenario those failures would occur. How would a customer produce/reproduce the problem(s) you're describing? What is the severity of the problem(s)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would a customer produce/reproduce the problem(s) you're describing?
I do not have the time to work out test cases for this I am afraid. You would have to consider various scenarios involving commits appearing on the base and/or PR branches, including merge commits of base into PR branch appearing on the PR branch, at different times interleaved with the scan or event hook vs. loading of Jenkinsfile
contents vs. checkout scm
, and introducing artificial delays between API calls to discover race conditions.
What is the severity of the problem(s)?
Again TBD. My broader point is that it is simpler to reason about the correctness of the current behavior, which relies only on straightforward calls to reasonably well documented APIs (that also would be likely to apply to other Git hosting as you mentioned): it picks up the two branch head references at the time of event/scan, saves them, and when requested merges them exactly as CLI git merge
would.
@@ -981,19 +982,22 @@ public GHPermissionType fetch(String username) throws IOException, InterruptedEx | |||
public SCMRevision create(@NonNull PullRequestSCMHead head, | |||
@Nullable Void ignored) | |||
throws IOException, InterruptedException { | |||
String baseHash = pr.getBase().getSha(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all I did was move the assignments/return statements around
Unclear to me. The original code called GhRepository.getRef
on the base branch, which would retrieve its current head. The new code is extracting .base.sha
from the PR API response object, which can be a completely different value. (IIRC it is typically, but not always, the last common ancestor between the two branches.)
I can put them back so they don't show up in the diff.
If some part of a patch can be reverted so that the real changes can be seen to have more localized effects than of course this is preferable.
mergeRef.getObject().getSha(), | ||
pr.getHead().getSha()); | ||
baseHash = ghRepository.getRef("heads/" + head.getTarget().getName()).getObject().getSha(); | ||
mergeHash = pr.getMergeCommitSha(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That the new behavior will be nondeterministic, and the PullRequestSCMRevision
may wind up with three fields that are not in fact consistent.
assert !pr.isMerge(); // TODO see below | ||
this.ref = ((PullRequestSCMRevision) rev).getPullHash(); | ||
if (pr.isMerge()) { | ||
this.ref = ((PullRequestSCMRevision) rev).getMergeHash(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git merge is in one place isn't git merge in another?
The resulting trees would be identical if
- both merges started with the same parent commits
- both merges used the same algorithm and other options
The second point is perhaps only theoretical. The first point seems to be a real possibility.
How significant can the differences be?
Who knows? Its behavior is undocumented.
At a minimum, you would have to replace the existing behavior of GitHubSCMBuilder.build
to ensure that it was using the same merge commit rather than doing its own local merge.
* | ||
* @return The commit hash of the head of the pull request branch | ||
*/ | ||
@CheckForNull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I was just mentioning a potential problem while I was here, in case you had not noticed it yet.
Another risk of
It will not be an ancestor of any visible ref, so by typical Git rules it would subject to garbage collection. Does the highly specialized backend used by GitHub in fact collect it? Hard to say. The value of the “test merge commit” (that returned for a currently open PR) is only discussed in the context of checking mergeability and the These questions are relevant for Jenkins because it is expected that a ( As an interesting aside: Prow, used by JX “serverless”, bypasses all these issues because regular PR builds are only ever done on the PR head. Whereas Jenkins in We could try to emulate this logic, perhaps: trusted users could use a label or ChatOps to request that Jenkins perform a merge when it can. (Guiding users to use Prow/Tide in conjunction with Jenkins is also an option, though it requires a Kubernetes cluster and a bunch of configuration that would need to be synchronized with Jenkins’ own somehow.) People do frequently request such a feature. From a security perspective, full operation of |
30b0e98
to
5be2e88
Compare
Well, the scenario matters. If the non-determinism you're talking about is only encountered under rare conditions and has minor consequences, then I would say this change is still valid. My analysis indicates that is exactly the case here. Assumption: Most (if not all) customers run PRs with the merge strategy because they want to know if their latest change is mergeable. Scenario 1: When changes appear on the base or PR branch after the build is triggered but before the PR merge commit is queried. In that case, the build could pick up a later commit. However, all that results in two builds of the same newer commit, which in the PR scenario is a non-issue. Customers run PRs with the merge strategy because they want to know if their latest change is mergeable. (Scenario 2 below)
No, I tested this and the commit remains available via the api for at least several days - I haven't seen one become unavailable yet. But this behavior is undocumented, let's assume they are garbage collected immediately, what is the result? A build might fail due to asking for file from a merge commit that doesn't exist. But that isn't particularly bad - the build failed because the base or PR have changed so the user probably wants to run on the new changes not the old ones.
According to my testing, yes on all three counts.
Scenario 2: Customer goes back replay an old PR build. Funny thing - this was already broken before this change. When I click "replay" the commits that are run by replay for merge PRs are the latest from both base and PR branch. This change doesn't modify how base and PR heads are determined, so ... This doesn't mean that this is okay, but fixing it a separate issue and also supports my base assumption as no one has filed this as bug.
This also an interesting idea but out of scope in relation to fixing this plugin such that merge PRs do not need to be cloned on master. From what I can see, using diffs will take significantly greater investment of resources and will provide little to no value in core customer scenarios. |
7c76b1f
to
1ff3ff2
Compare
It is possible for the base and source branches to change between when a PR is seen and we look on the the commits and generate the merge commit. This change adds checks that the three commits in the PullRequestSCMRevision are consistent with each other.
1ff3ff2
to
fe37f64
Compare
After discussion with Jesse, I've updated this to check the consistency of the base, source, and merge commits. This should result in builds failing early when inconsistent commits are found. |
I will not have time to do automated tests before I go on vacation. I don't feel this is a blocker for merging (there is plenty of this plugin that is doesn't have unit tests) and I have manually tested this sufficiently for it to go out for in an alpha release. |
Work-in-progress.