Skip to content

[JENKINS-63766] Work around JDK-8231454 #543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 24, 2022
Merged

Conversation

basil
Copy link
Member

@basil basil commented May 24, 2022

Background

See JENKINS-63766 and upstream bug JDK-8231454. Users have reported gradual exhaustion of the metaspace when running on Java 11 with Pipeline jobs that define custom classes.

Steps to reproduce

Create a new Pipeline job with

class asdf {
  void run () {}
}

new asdf().run()

and run it under Java 11 with -verbose:class. After the job finishes, run jcmd ${PID} GC.run

Expected results

Note: These are the actual results when running under Java 8 or Java 17.

The WorkflowScript and asdf classes get unloaded.

Actual results

The WorkflowScript and asdf classes never get unloaded on Java 11.

Evaluation

As described in the upstream bug, a memory memory leak can occur when java.beans.Introspector#getBeanInfo is invoked against a class, even if java.beans.Introspector#flushFromCaches is later called on that class. In our case, java.beans.Introspector#getBeanInfo is invoked from the following stack trace:

getBeanInfo:196, Introspector (java.beans)
run:3328, MetaClassImpl$15 (groovy.lang)
doPrivileged:-1, AccessController (java.security)
addProperties:3326, MetaClassImpl (groovy.lang)
initialize:3303, MetaClassImpl (groovy.lang)
getMetaClassUnderLock:289, ClassInfo (org.codehaus.groovy.reflection)
getMetaClass:331, ClassInfo (org.codehaus.groovy.reflection)
getMetaClass:277, MetaClassRegistryImpl (org.codehaus.groovy.runtime.metaclass)
getMetaClass:905, InvokerHelper (org.codehaus.groovy.runtime)
createCallConstructorSite:86, CallSiteArray (org.codehaus.groovy.runtime.callsite)
defaultCallConstructor:59, CallSiteArray (org.codehaus.groovy.runtime.callsite)
callConstructor:238, AbstractCallSite (org.codehaus.groovy.runtime.callsite)
call:208, Checker$3 (org.kohsuke.groovy.sandbox.impl)
onNewInstance:42, GroovyInterceptor (org.kohsuke.groovy.sandbox)
onNewInstance:173, SandboxInterceptor (org.jenkinsci.plugins.scriptsecurity.sandbox.groovy)
call:205, Checker$3 (org.kohsuke.groovy.sandbox.impl)
checkedConstructor:210, Checker (org.kohsuke.groovy.sandbox.impl)
constructorCall:21, SandboxInvoker (com.cloudbees.groovy.cps.sandbox)
dispatchOrArg:97, FunctionCallBlock$ContinuationImpl (com.cloudbees.groovy.cps.impl)
fixName:78, FunctionCallBlock$ContinuationImpl (com.cloudbees.groovy.cps.impl)
invoke0:-1, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:62, NativeMethodAccessorImpl (jdk.internal.reflect)
invoke:43, DelegatingMethodAccessorImpl (jdk.internal.reflect)
invoke:566, Method (java.lang.reflect)
receive:72, ContinuationPtr$ContinuationImpl (com.cloudbees.groovy.cps.impl)
eval:21, ConstantBlock (com.cloudbees.groovy.cps.impl)
step:83, Next (com.cloudbees.groovy.cps)
call:174, Continuable$1 (com.cloudbees.groovy.cps)
call:163, Continuable$1 (com.cloudbees.groovy.cps)
use:136, GroovyCategorySupport$ThreadCategoryInfo (org.codehaus.groovy.runtime)
use:275, GroovyCategorySupport (org.codehaus.groovy.runtime)
run0:163, Continuable (com.cloudbees.groovy.cps)
access$001:18, SandboxContinuable (org.jenkinsci.plugins.workflow.cps)
run0:51, SandboxContinuable (org.jenkinsci.plugins.workflow.cps)
runNextChunk:187, CpsThread (org.jenkinsci.plugins.workflow.cps)
run:420, CpsThreadGroup (org.jenkinsci.plugins.workflow.cps)
access$400:95, CpsThreadGroup (org.jenkinsci.plugins.workflow.cps)
call:330, CpsThreadGroup$2 (org.jenkinsci.plugins.workflow.cps)
call:294, CpsThreadGroup$2 (org.jenkinsci.plugins.workflow.cps)
call:67, CpsVmExecutorService$2 (org.jenkinsci.plugins.workflow.cps)
run:264, FutureTask (java.util.concurrent)
run:139, SingleLaneExecutorService$1 (hudson.remoting)
run:28, ContextResettingExecutorService$1 (jenkins.util)
run:68, ImpersonatingExecutorService$1 (jenkins.security)
call:515, Executors$RunnableAdapter (java.util.concurrent)
run:264, FutureTask (java.util.concurrent)
runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
run:829, Thread (java.lang)

java.beans.Introspector#flushFromCaches is later called on that class, but to no avail:

cleanUpLoader:1300, CpsFlowExecution (org.jenkinsci.plugins.workflow.cps)
cleanUpHeap:1269, CpsFlowExecution (org.jenkinsci.plugins.workflow.cps)
run:464, CpsThreadGroup (org.jenkinsci.plugins.workflow.cps)
access$400:95, CpsThreadGroup (org.jenkinsci.plugins.workflow.cps)
call:330, CpsThreadGroup$2 (org.jenkinsci.plugins.workflow.cps)
call:294, CpsThreadGroup$2 (org.jenkinsci.plugins.workflow.cps)
call:67, CpsVmExecutorService$2 (org.jenkinsci.plugins.workflow.cps)
run:264, FutureTask (java.util.concurrent)
run:139, SingleLaneExecutorService$1 (hudson.remoting)
run:28, ContextResettingExecutorService$1 (jenkins.util)
run:68, ImpersonatingExecutorService$1 (jenkins.security)
call:515, Executors$RunnableAdapter (java.util.concurrent)
run:264, FutureTask (java.util.concurrent)
runWorker:1128, ThreadPoolExecutor (java.util.concurrent)
run:628, ThreadPoolExecutor$Worker (java.util.concurrent)
run:829, Thread (java.lang)

Solution

JDK-8231454 solves the problem by removing the class from the ClassInfo cache in Introspector#flushFromCaches. I plan to backport JDK-8231454 to jdk11u-dev upstream. In the meantime, this PR works around the problem for Jenkins users by doing the same thing via reflection. This, in turn, requires --add-opens java.desktop/com.sun.beans.introspect=ALL-UNNAMED, which is not in core's --add-opens configuration right now. I plan to advise affected users to add this to their configuration manually, and I plan to follow up with a core PR (and backport) to add this to the --add-opens configuration in core for current weeklies and LTS releases.

Testing done

Java 8 and Java 17

Ran the scenario without any --add-opens directives. Verified in a debugger that the new code did not get executed. Verified via verbose class logging that the asdf class was unloaded.

Java 11

Ran the scenario with --add-opens java.desktop/com.sun.beans.introspect=ALL-UNNAMED. Verified in a debugger that the new code was executed and removed the asdf class from the cache. Verified via verbose class logging that the asdf class was unloaded.

Ran the scenario without any --add-opens directives and no extra logging. Verified that no log spam occurred and that the asdf class was not unloaded.

Ran the scenario without any --add-opens directives and a custom log recorder. Verified that a FINER level log was created and that the asdf class was not unloaded.

Java 11 with openjdk/jdk11u-dev#1103

Ran the scenario with and without --add-opens. Verified that no log spam occurred and that the asdf class was unloaded in both cases.

Comment on lines +1365 to +1366
if (current.isNewerThan(new JavaSpecificationVersion("1.8"))
&& current.isOlderThan(new JavaSpecificationVersion("16"))) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in JENKINS-63766, this bug was introduced in Java 9 and was fixed in Java 16. The only version of Java in this range that we expect users to be using realistically is Java 11, but cover the whole range just to be safe.

Object cache = cacheF.get(null);
Class<?> cacheC = Class.forName("com.sun.beans.util.Cache");
if (LOGGER.isLoggable(Level.FINER)) {
LOGGER.log(Level.FINER, "Cleaning up " + clazz.getName() + " from ClassInfo#CACHE.");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted to print whether the class was already in the cache or not, as is done elsewhere in this file, but Cache#get always adds an entry if none exists, and there seems to be no Cache#contains method, so logging the result of Cache#get would likely do more harm than good, so I avoided it.

LOGGER.log(Level.FINER, "Cleaning up " + clazz.getName() + " from ClassInfo#CACHE.");
}
Method removeM = cacheC.getMethod("remove", Object.class);
removeM.invoke(cache, clazz);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache#get and Cache#remove synchronize internally on a queue object, so there is no synchronization needed here.

@basil
Copy link
Member Author

basil commented May 24, 2022

Upstream fix in openjdk/jdk11u-dev#1103.

Comment on lines +1448 to +1457
Object cache = cacheF.get(null);
if (cache instanceof ConcurrentMap) {
// Prior to JDK-8277072
Iterator<? extends Entry<Reference<Class<?>>, ?>> iterator = ((ConcurrentMap) cache).entrySet().iterator();
while (iterator.hasNext()) {
if (iterator.next().getKey().get() == clazz) {
iterator.remove();
LOGGER.log(Level.FINER, "cleaning up {0} from ObjectStreamClass.Caches.{1}", new Object[]{clazz.getName(), cacheFName});
break;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate fix for a related bug I noticed while testing under latest OpenJDK 11.0.16 (in particular, while testing with openjdk/jdk11u-dev#1103). I got the following new error:

java.lang.ClassCastException: class java.io.ObjectStreamClass$Caches$1 cannot be cast to class java.util.concurrent.ConcurrentMap (java.io.ObjectStreamClass$Caches$1 and java.util.concurrent.ConcurrentMap are in module java.base of loader 'bootstrap')
        at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpObjectStreamClassCaches(CpsFlowExecution.java:1448)
        at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpLoader(CpsFlowExecution.java:1305)
        at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.cleanUpHeap(CpsFlowExecution.java:1270)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:464)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.access$400(CpsThreadGroup.java:95)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:330)
        at org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:294)
        at org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:67)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
        at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
        at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

Looking into this further, it appears the type of this field was changed in JDK-8277072, which was backported to OpenJDK 11.0.16, which I happened to be testing. Skipping over this logic got the error to go away, and I confirmed that the class was still unloaded when running with JDK-8277072 and without entering this if statement, so I conclude that JDK-8277072 made this if statement unnecessary.

Copy link
Member

@dwnusbaum dwnusbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -1298,6 +1299,7 @@ private static void cleanUpLoader(ClassLoader loader, Set<ClassLoader> encounter
if (encounteredClasses.add(clazz)) {
LOGGER.log(Level.FINER, "found {0}", clazz.getName());
Introspector.flushFromCaches(clazz);
cleanUpClassInfoCache(clazz);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try {
// TODO Work around JDK-8231454.
Class<?> classInfoC = Class.forName("com.sun.beans.introspect.ClassInfo");
Field cacheF = classInfoC.getDeclaredField("CACHE");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, does anyone happen to know why this is not just using ClassValue, which is supposed to handle exactly this use case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this yesterday, too. Ask the OpenJDK people?

@jglick jglick enabled auto-merge May 24, 2022 19:21
@jglick jglick merged commit 0449852 into jenkinsci:master May 24, 2022
@basil basil deleted the JENKINS-63766 branch May 25, 2022 15:09
jglick added a commit that referenced this pull request Aug 5, 2022
[JENKINS-63766] Work around JDK-8231454

(cherry picked from commit 0449852)
basil added a commit to basil/jenkins.io that referenced this pull request Aug 18, 2022
MarkEWaite pushed a commit to jenkins-infra/jenkins.io that referenced this pull request Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants