[JENKINS-72325] Define an executor and scheduler for SandboxResolvingClassLoader
#543
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Environment
Apache Maven 3.9.5 (57804ffe001d7215b5e7bcb531cf83df38f93546)
Maven home: /usr/share/apache-maven-3.9.5
Java version: 21.0.1, vendor: Eclipse Adoptium, runtime: /opt/jdk-21
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.197-186.748.amzn2.x86_64", arch: "amd64", family: "unix"
Steps to reproduce
See JENKINS-72325. Run a
jenkinsci/bom
job with the following patch:This patch runs
org.jenkinsci.plugins.workflow.libs.LibraryMemoryTest
in a loop against the latest Jenkins weekly release on Java 21.Expected results
The test runs in a loop without failure. This is the actual result when running the test on Java 17.
Actual results
The test eventually fails after 1-25 iterations with
From standard error:
The failure cannot be easily reproduced locally; however, it failed in CI (with the above patch) after about 1-25 iterations against commit 2d062012 (Nov 13), commit 7ae60b1 (Nov 5), commit 430707e (October 29), commit c300386 (October 16), commit 155e2e7 (October 9), and commit 2798690 (October 2). So this is not a very recent flake.
Evaluation
Care must be taken to avoid leaking instances of
GroovyClassLoader
when computing cached values inSandboxResolvingClassLoader
. In its default configuration, Caffeine usesForkJoinPool#commonPool
as itsExecutor
. As of recent Java versions,ForkJoinPool
can capture a reference toGroovyClassLoader
by creating aForkJoinWorkerThread
whoseThread#inheritedAccessControlContext
refers to anAccessControlContext
whoseProtectionDomain
refers toGroovyClassLoader
.This is arguably a Java Platform regression, likely caused by JEP-425. However, as of JEP-411,
Thread#inheritedAccessControlContext
is deprecated for removal. Therefore, we explicitly decline to report this issue upstream, although we must contend with it in the short to medium term in Jenkins.Rejected Solution
We initially tried configuring Caffeine with the
Executor
returned byExecutors#newCachedThreadPool
. However, this merely traded one set of problems for another. ThisExecutor
was seen to also capture a reference toGroovyClassLoader
by creating aThread
whoseThread#inheritedAccessControlContext
refers to anAccessControlContext
whoseProtectionDomain
refers toGroovyClassLoader
:Additionally, when the thread pool's
ThreadFactory
is not wrapped byClassLoaderSanityThreadFactory
(see JENKINS-49206), theExecutor
can sometimes createThread
instances whoseThread#contextClassLoader
refers toGroovyClassLoader
:Given the problems with this approach, we rejected a solution of
Executors#newCachedThreadPool
.Solution
We create a dedicated
Executors#newSingleThreadExecutor
, which is safe for use with Caffeine from a memory perspective because:ForkJoinPool#commonPool
, the thread is eagerly created and avoids references toGroovyClassLoader
inThread#inheritedAccessControlContext
.Executors#newCachedThreadPool
, the thread is eagerly created and avoids references toGroovyClassLoader
inThread#inheritedAccessControlContext
.Executors#newCachedThreadPool
, the thread is eagerly created and avoids references toGroovyClassLoader
inThread#contextClassLoader
, thereby avoiding the need forClassLoaderSanityThreadFactory
.A single-threaded
Executor
is safe for use with Caffeine from a CPU perspective because the cache's work is implemented with cheap O(1) algorithms.Future work
In the medium term, once
Thread#inheritedAccessControlContext
is removed upstream, we could possibly switch to a combination ofExecutors#newCachedThreadPool
andClassLoaderSanityThreadFactory
.The comments state:
If this were done, then this two-level Caffeine cache could possibly be replaced by something based on ClassValue. See, for example, org.kohsuke.stapler.ClassLoaderValue. We explicitly decline to perform this cleanup, but we have updated the existing wish list comment to add this cleanup to the list.
Implementation notes
See the code comments.
Testing done
We tested by running the test in a loop in CI (the only environment where the problem can be reliably reproduced in under 50 iterations) in four configurations:
In configuration 3, the test always failed within 50 iterations with the error shown above. In configuration 1, the test passed reliably after two hours of iterations. After this PR, configurations 2 and 4 are passing reliably after two hours of iterations on both Java 17 and Java 21.
In addition to the above tests, I also ran a successful
jenkinsci/bom
test with theweekly-test
label to test PCT for all plugins against the weekly release line.