Skip to content

Finish the experimental method handle support #1894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

lukesandberg
Copy link
Contributor

This series addresses known issues in the experimental method handle implementation

  • MethodHandles produced by InternalFactory.getHandle are now cached. This should speed up linking and reduce the number of 'lambda forms' we generate.
    • Doing this required demoting Dependency from the getHandle to be a runtime method parameter and changing InternalFactory to be an abstract class
  • We no longer generate custom Provider subclasses to hold invokedynamic instructions and instead use a simple hand written Provider class that lazily allocates a MethodHandle and calls invokeExact.
    • This should greatly speed up injector creation at the cost of a slightly more expensive invokeExact call.
  • requestStaticInjection and MembersInjectors are now fully supported with MethodHandles.

I imagine there are some as yet undiscovered issues but this should unblock broader experimentation.

@sameb
Copy link
Member

sameb commented Apr 2, 2025

Thanks Luke! Will try to pull and test, and will review here soon!

@lukesandberg
Copy link
Contributor Author

Welp I couldn't get bazel working and it shows... lol

@lukesandberg
Copy link
Contributor Author

Fixed the @ForOverride issue that the workflow flagged

Copy link
Member

@sameb sameb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(approving so i can pull and validate internally, still need to review properly)

Copy link
Member

@sameb sameb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: I'm curious about the caching strategy. AIUI, the cache for a given MH is unique per InternalFactory, but InternalFactory isn't widely shared (they are unique per binding IIRC). Is this improving cachability (and repeated injector constructions in tests, etc..) because we're removing the custom bytecode-generated Provider class? It seems like each distinct Injector will still have its own unique MethodHandles... but I guess that's more lightweight than a whole class definition per Provider? What value is the MH cacheability actually giving us if its unique per binding (or am I misunderstanding the scope of the caching)?

@@ -345,8 +345,12 @@ private static Stream<Class<? extends Annotation>> getPermits(Class<?> clazz) {
Stream.concat(
annotations, Arrays.stream(clazz.getAnnotatedSuperclass().getAnnotations()));
}
return annotations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept getting a build error here from maven

Type mismatch: cannot convert from Stream<Class<capture#26-of ? extends Annotation>> to Stream<Class<? extends Annotation>>Java(16777235)

Introducing this expliocit cast resolved it, so i guess i cannot figure out why this isn't a problem at head?

im guessing it is because i am using a different version of javac

% javac --version
javac 21.0.6

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i was confused before. this appears to be an incompatibiltiy with ecj which is used by the vscode java plugins... sigh. It looks like there are number of issues where ecj reports and error and javac does not (and visa versa).

Added a comment to that effect. I can revert if you prefer

@sameb
Copy link
Member

sameb commented Apr 4, 2025

(Also, I ran through the depot last night, but forgot to flip the "enable this" flag to true... so running again now. Will see what it shakes out.)

@sameb
Copy link
Member

sameb commented Apr 4, 2025

(Also, I ran through the depot last night, but forgot to flip the "enable this" flag to true... so running again now. Will see what it shakes out.)

depot is fairly happy. Some things that surfaced:

  • Behavior changed for Provider instances that throw Errors (and possibly other things that throw errors?). e.g, something like bind(Foo.class).toProvider( { throw new AssertionError() } ).. previously injector.getInstance(Foo.class) would surface that AssertionError, whereas now its wrapped inside a ProvisionException. I'd argue that the ProvisionException is actually more correct, but we should avoid the behavior change for now and do it intentionally later if desired. (This is commonly surfaced from @provides methods that threw Errors, but sometimes also just raw Provider instances. Presumably it'd be the same with a constructor that threw this way w/o any custom provider instance?)

  • A very specific allocation leakage test which should have shaken out "warmup" allocations, but now there appears to be new long-lived allocations per usage (despite a long-lived injector). I don't think this PR is the cause (because AFAICT the "warmup" phase should pre-create all the requisite MHs), but will need to dig in further. Not blocking this PR, but maybe you see something in this PR that's always creating new data per injection.

@lukesandberg
Copy link
Contributor Author

General question: I'm curious about the caching strategy. AIUI, the cache for a given MH is unique per InternalFactory, but InternalFactory isn't widely shared (they are unique per binding IIRC). Is this improving cachability (and repeated injector constructions in tests, etc..) because we're removing the custom bytecode-generated Provider class? It seems like each distinct Injector will still have its own unique MethodHandles... but I guess that's more lightweight than a whole class definition per Provider? What value is the MH cacheability actually giving us if its unique per binding (or am I misunderstanding the scope of the caching)?

So you understand the caching correctly. And indeed this will not address overhead in the case where a user constructs 2 identical or nearly identical injectors. However within the scope of a single injector this caching is useful since we will call getHandle once for every injection point.

e.g.

@Provides Foo provideFoo(Bar bar) {}

@Provides OthjerFoo provideOtherFoo(Bar bar) {}

We will produce 2 ProviderMethod instances that each have a SingleParameterInjector pointed at the InternalFactory for Bar. So by caching the Bar handle we ensure that each of those ProviderMethod instances can share the implementation handle.

So this is still valuable within the scope of an injector. Not every binding is consumed multiple times, but some are and we don't want to recompute those.

So this really isn't improving repeated injector construction in tests. It might help a little just since creating handles for each injector is a little cheaper, but it doesn't enable cross injector sharing.

The thing that does help the 'multiple injectors in a test' thing is dropping the bytecode generation since that was happening eagerly during injector creation (due to scopes), but now handle creation happens on demand.

Cross injector sharing is a compelling idea and it is something we do get with our fast-class code generation but not with MethodHandles. Im really not sure how to actually achieve it. We could create a global interning map for InternalFactory instances, but yikes the interning rules would be complex since InternalFactory instances are constructed in a complex way, can depend on arbitrary Injector state (and sometimes even the Injector itself). So i think to do it would require some non-trivial effort and I am relatively inclined to think we shouldn't try (if a user has a problem they should switch to using childI injectors to share factories).

thoughts?

@lukesandberg
Copy link
Contributor Author

(Also, I ran through the depot last night, but forgot to flip the "enable this" flag to true... so running again now. Will see what it shakes out.)

depot is fairly happy. Some things that surfaced:

  • Behavior changed for Provider instances that throw Errors (and possibly other things that throw errors?). e.g, something like bind(Foo.class).toProvider( { throw new AssertionError() } ).. previously injector.getInstance(Foo.class) would surface that AssertionError, whereas now its wrapped inside a ProvisionException. I'd argue that the ProvisionException is actually more correct, but we should avoid the behavior change for now and do it intentionally later if desired. (This is commonly surfaced from @provides methods that threw Errors, but sometimes also just raw Provider instances. Presumably it'd be the same with a constructor that threw this way w/o any custom provider instance?)

Ah rats, forgot about this one. Will update with a test and a fix. I agree this seems odd

  • A very specific allocation leakage test which should have shaken out "warmup" allocations, but now there appears to be new long-lived allocations per usage (despite a long-lived injector). I don't think this PR is the cause (because AFAICT the "warmup" phase should pre-create all the requisite MHs), but will need to dig in further. Not blocking this PR, but maybe you see something in this PR that's always creating new data per injection.

Interesting. I wonder if it is the allocations related to the lazy MethodHandles construction? That only happens when they actually call get. There are also things like the Intitializable and Singleton bindings that will re-bind themselves after succesfull provisioning. Beyond that (which honestly the warmup should take care of...), im not sure what it would be.

…ime value'

Initially I thought it would make sense to inspect `Dependency` at link time (during `getHandle` calls), since it would allow us to trim null checks and optimize checks for proxies.  However, both of those ideas turned out to not be very fruitful.

* It is true that if an injection point was annotated as a`@Nullable` we could elide any null checks which is nice, but `@Nullable` injection points and nullish values are rare so optimizing for them is not useful.
* I initially thought that if we called `tryStartConstruction` for a non-interface `Dependency` then we could not even bother checking the return value since a proxy was impossible.  However, `ConstructorBinding` implementations can use the `finishConstructionAndSetReference` to satisfy re-entrant construction without a constructing a proxy.  So this wasn't sufficient information.

So the benefits of having `Dependency` available at link-time turned out to be either unachievable or very minor.

On the other side the costs were high:

* This necessitated a complex solution for Scoped providers (deleted in this commit)
* This would interfere with future attempts to have `InternalFactory` types cache their factories.
The ideal way to invoke a method handle is by binding it to an `invokedynamic` instruction.  This provides the VM with a lot of information about the lifetime and target of the methodhandle.  On the other hand calling `invokeExact` on a more 'dynamic' methodhandle is slower since the VM always needs to do things like verify that the signature of the callsite matches the handle, and then resolve to the actual target.  See https://wiki.openjdk.org/display/HotSpot/Deconstructing+MethodHandles for details.

This is why the methodhandle implementation initially pursued bytecode generation, simply so we could bind to an `invokedynamic` instruction.  However, this comes with a massive downside of needing to generate tons of classes.  This slows down injector creation substantially and puts pressure on the vms `Compressed class space` in larger applications which limits scalability of this solution.  So mitigate those risks this commit undoes the bytecode generation.

A compromise solution would probably be some kind of `JIT` approch where we generate a custom provider once it gets hot enough so we would only bother with those Provider instances that get invoked a lot.  However, im guessing the small perf penalty of `invokeExact` on a non-constant handle will be small and fixed and such it probably won't be too impactful in the limit.
…rastructure

Now that we no longer generate Provider subclasses for MethodHandle support this isn't needed.
All factories have implemented this method now and it is no longer needed.
This will actualy improve performance on its own since `invokevirtual` is faster than `invokeinterface` for megamorphic callsites (of which many of ours are).

This will also provide a natural place to add a method handle cache in the future
…ase class and simplify the protocol

This is a tiny simplification and provides a simple consistent strategy for handling re-entrancy into the factory.
Copy link
Member

@sameb sameb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(marking the latest iteration approved again, to be able to pull/test it.)

Copy link
Member

@sameb sameb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving to pull internally again...

Each method handle is potentially compiled to a concrete java method and
so we want to avoid recreating identical handles.  This should help
preserve VM resources like 'compressed class space' and improve
linkage performance.  The caching strategy is a little complicated since
we need to respect the `linked` boolean parameter.  Most factories
ignore this parameter but some respect it so our caching strategy allows
for both.
How valuable this is is somewhat debatable, but it
 is necessary if we want to make the fast-class
  support completely optional in the future.

This revealed a number of latent issues I am
addressing at the same time.  Notably, some of the
`asCollector` calls used for the fast-class
fallbacks were not correct so this just eliminates
all of them.
Most of the work was already done for ConstructorBindings but this required a different organization of the code and some caching.

With this I believe everything is re-implemented in terms of MethodHandles.
…hen method handles are in use.

Guice at head handles all `Errors` thrown by user code and wraps them in `ProvisionException` unless it is thrown by a ProviderInstanceBinding in which case it propagates unwrapped.
…d `ImmutableSet` objects

Some investigation in https://stackoverflow.com/questions/79551257/efficiently-build-a-large-array-with-methodhandles revealed that the simple recursion approach was often the best so all approaches were simplified to do that.  For the ImmutableMap/ImmutableSet cases, special cases were added for the `of` overloads that are supported by guava. From perusing the source these trigger fast paths and allow us to avoid the `Builder` allocation.
…our `catch` clauses for constructor bindings

Fix a similar issue with @Inject methods

Working on this reveals a behavior difference in exception reproting between `@Inject` constructors and `@Provides` methods, so i added a test to demonstrate that as well.  This is easy to fix but not a big deal.
@lukesandberg
Copy link
Contributor Author

Just repushed to fix the javadoc issue flagged by bazel CI... sorry

i guess ill try to get the bazel build working locally again

Copy link
Member

@sameb sameb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(approving to pull to internal repo)

@sameb
Copy link
Member

sameb commented Apr 8, 2025

  • A very specific allocation leakage test which should have shaken out "warmup" allocations, but now there appears to be new long-lived allocations per usage (despite a long-lived injector). I don't think this PR is the cause (because AFAICT the "warmup" phase should pre-create all the requisite MHs), but will need to dig in further. Not blocking this PR, but maybe you see something in this PR that's always creating new data per injection.

Interesting. I wonder if it is the allocations related to the lazy MethodHandles construction? That only happens when they actually call get. There are also things like the Intitializable and Singleton bindings that will re-bind themselves after succesfull provisioning. Beyond that (which honestly the warmup should take care of...), im not sure what it would be.

Re: this case, the path causing issues seems to follow the following chain of logic:

  1. Per request, something injects the Injector.
  2. Code then calls injector.getBinding(Key(){})
  3. Code then calls thatBinding.getProvider()
  4. Code then calls thatProvider.get()

That seems to trigger InternalMethodHandles.MethodHandleProvider.get() to call getHandle().invokeExact(currentContext), which leads the following reverse call-chain:

java.base/java.lang.invoke.LambdaForm.create(LambdaForm.java:342),
java.base/java.lang.invoke.LambdaForm.customize(LambdaForm.java:465),
java.base/java.lang.invoke.MethodHandle$1.apply(MethodHandle.java:1858),
java.base/java.lang.invoke.MethodHandle$1.apply(MethodHandle.java:1856),
java.base/java.lang.invoke.MethodHandle.updateForm(MethodHandle.java:1878),
java.base/java.lang.invoke.MethodHandle.customize(MethodHandle.java:1856),
java.base/java.lang.invoke.MethodHandle.maybeCustomize(MethodHandle.java:1846),
java.base/java.lang.invoke.Invokers.maybeCustomize(Invokers.java:633),
java.base/java.lang.invoke.Invokers.checkCustomized(Invokers.java:627), 
com.google.inject.internal.InternalMethodHandles$MethodHandleProvider.get(InternalMethodHandles.java:902)

(note the line # is from a prior snapshot, not the latest)

So either something in per-request Injector.getBinding().getProvider().get() is defeating MH caching, or the use of invokeExact is causing unique long-lived "LambdaForm"s to be created per usage.

(FYI I am probably going to submit this PR as-is since it's flag-guarded and is big enough for now, and we can follow up w/ fixes on another PR. I'm also going on vacation starting tomorrow until the 18th, so will resume looking at this when I get back.)

@lukesandberg
Copy link
Contributor Author

interesting, ill see if i can write a test (in another PR!) to replicate this.

invokeExact is a somewhat magical method @PolymorphicSignature and performs custom linking. As expected calling invokeExact on a non-constant (essentially volatile) MethodHandle is slower, and i think part of that is this 'handle customization' work. From reading some old threads, i see that this 'customization' is triggered when a method is JIT'd that contains a call to a MethodHandle that is not visible to the jit. This applies to MethodHandleProvider since that get method will get compiled but getHandle is always returning (from the JITs perspective) a new handle.

So the JVM avoids special handling, but tries a bit harder after 127 invocations (controlled by java.lang.invoke.MethodHandle.CUSTOMIZE_THRESHOLD which defaults to 127). So I'm guessing that that is how we defeat the tests warmup.

So i think the solution is one of....

  1. Go back to bytecode generating the provider subtypes. This gives the JIT a much easier job and can trivially 'customize' on first compile, but of course comes with all kinds of other performance and bootstrapping tradeoffs.
  2. Hide the handles behind a 're-binding' callsite?
    • This would be some kind of tableSwitch to pick the delegate handle, but also using some kind of WeakReference to avoid pinning injectors... sigh. Alternatively we could bytecode gen one MethodHandleProvider class per injector....?
    • This solution is cautiously recommended here after a very long discussion.
    • However, for us, we would be re-binding this callsite like 1000's of times... so it would induce a TON of work in the JIT.
    • And afaict this is just re-implementing what MethodHandle 'customization' is meant to do anyway.
  3. Somehow force 'customization' to happen sooner? This is probably a bad tradeoff since a fair number of these Provider instances would only ever be invoked a small number of times.
  4. Decide that the test in question is simply not important? I'm guessing the test is using some kind of AllocationAgent that is sensitive to the allocations in Java code but not the JIT compiler... but that is more or less what they are measuring here. So we could fix the test or just disable customization for them by setting -Djava.lang.invoke.MethodHandle.CUSTOMIZE_THRESHOLD=-1. which will give them more deterministic behavior but worse performance.

@sameb
Copy link
Member

sameb commented Apr 8, 2025

  1. Go back to bytecode generating the provider subtypes. This gives the JIT a much easier job and can trivially 'customize' on first compile, but of course comes with all kinds of other performance and bootstrapping tradeoffs.

This would be sad. :-(

  1. Hide the handles behind a 're-binding' callsite?

    • This would be some kind of tableSwitch to pick the delegate handle, but also using some kind of WeakReference to avoid pinning injectors... sigh. Alternatively we could bytecode gen one MethodHandleProvider class per injector....?
    • This solution is cautiously recommended here after a very long discussion.
    • However, for us, we would be re-binding this callsite like 1000's of times... so it would induce a TON of work in the JIT.
    • And afaict this is just re-implementing what MethodHandle 'customization' is meant to do anyway.

Seems unfortunate but maybe necessary?

  1. Somehow force 'customization' to happen sooner? This is probably a bad tradeoff since a fair number of these Provider instances would only ever be invoked a small number of times.

+1 bad tradeoff.

  1. Decide that the test in question is simply not important? I'm guessing the test is using some kind of AllocationAgent that is sensitive to the allocations in Java code but not the JIT compiler... but that is more or less what they are measuring here. So we could fix the test or just disable customization for them by setting -Djava.lang.invoke.MethodHandle.CUSTOMIZE_THRESHOLD=-1. which will give them more deterministic behavior but worse performance.

I think effectively suppressing the test (whether ignoring LambdaForms or reducing the threshold & expecting an exact # of things) depends on the consequences in real-life behaviors. We need to ensure that the long-lived LambdaForms created per-request (from the invokeExacts) aren't created per every request in a unique way per request. Maybe the problem is just that the warmup threshold is lower than the customize threshold, so its being inlined after the warmup, but we're not actually creating a new LambdaForm per request, we're just doing it "per unique signature used in practice" (which shouldn't really grow, IIUC). WDYT?

@copybara-service copybara-service bot closed this in 33232b4 Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants