-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Do not start redis dev services in augmentation (plus supporting framework) #47610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Do not start redis dev services in augmentation (plus supporting framework) #47610
Conversation
…mplementation on Redis
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to discuss this. We cannot pre-allocate ports like this. We had the same issue with the configuration—this is not reliable.
We need to find a way to allocate the ports randomly when the containers are started. We may need to use a lazy config placeholder or something. What if we start dev services at startup time but introduce a config source before starting the application with the values defined by the dev service, including the mapper port?
core/deployment/src/main/java/io/quarkus/deployment/builditem/DevServicesTrackerBuildItem.java
Outdated
Show resolved
Hide resolved
extensions/devservices/common/src/main/java/io/quarkus/devservices/common/LazyContainer.java
Outdated
Show resolved
Hide resolved
extensions/devservices/common/src/main/java/io/quarkus/devservices/common/LazyContainer.java
Outdated
Show resolved
Hide resolved
extensions/devservices/common/src/main/java/io/quarkus/devservices/common/LazyContainer.java
Outdated
Show resolved
Hide resolved
Isn't the proper fix for #45786 not to share dev services between different test augmentation runs ? In that case, both application instances would use their own instance of dev service, without the port clashing issue. |
For the port reserving, I was tempted by this for other stuff. It seemed interesting to have something like I'd prefer if we kept the container runtime port assignment. |
I think such mechanism like dev services tracker build item is the way to go for reducing that boilerplate on that part. I was waiting for the unified dev service initiative to discuss this. But not sure if we can do this on the resulting dev service output config. As dev service input config are (usually) flat objects that hold relevant config and are assigned to a static volatile in the dev service processor class to keep them between augmentation runs. Maybe we can create and hold that state higher up instead of a build item class, which is needed to be loaded from a common classloader. |
I'm going to kickoff the dev service working group today, and associate this with it. Also, @dmlloyd, @radcortez and I discussed the random port allocation in a config related work. |
Yes, and no, and no. It depends what you mean by "share dev services". Dev services is overloaded to mean the external container, the There's a user workaround to 'not share dev services' by ensuring every profile includes configuration which hardcodes a distinct port, but that's not something we want people to have to do. |
I've had my first message on zulip saying "I think I found a problem, dev services are behaving weirdly in 3.22," so I think a fix for this is rather urgent (hence me not waiting for the WG).
A static volatile isn't sufficient, because it won't keep them between augmentation classloaders.
That's more or less the approach I took with the tracker. It holds state on the app classloader. |
I was referring to the external container, but I didn't think of the hardcoded port case. So this wasn't an issue before because different augmentations were on the same classloader? |
That is perfectly fine, I wasn't questioning the timing of this PR. |
The change is the timing of the augmentations. Before, we'd do the augmentation, run the test, then if we switched profile, we'd run the augmentation, run the next test ... Now we do all the augmentations up front. That has a few disadvantages (like the dev service issue), but it means we can get access to the right classloader for the test at the time we need it. In the old model, as long as a dev service closed itself on application shutdown, the port would be free for the next application, but now it needs to not occupy the port until it needs it. |
I can report that @cescoffier's suggestion of using a config source works perfectly, and was actually pretty easy. I've removed all dodgy pre-creation of sockets, and the random port test is still passing nicely. I've pushed changes in case people want to look at them, but the build will be red, because I added a devservices runtime module, and that breaks everything. (See discussion in zulip.) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
🎊 PR Preview f346fe6 has been successfully built and deployed to https://quarkus-pr-main-47610-preview.surge.sh/version/main/guides/
|
I've abandoned the idea of a I've also added a diagram to the main description. |
This comment has been minimized.
This comment has been minimized.
…ll quarkus classes
This comment has been minimized.
This comment has been minimized.
@@ -276,10 +286,19 @@ public void overrideConfig(Map<String, String> config) { | |||
RuntimeOverrideConfigSource.setConfig(runtimeClassLoader, config); | |||
} | |||
|
|||
private static void startDevServices(List<DevServicesResultBuildItem> devServicesResultBuildItems) { | |||
for (DevServicesResultBuildItem devServicesResultBuildItem : devServicesResultBuildItems) { | |||
devServicesResultBuildItem.start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will start services one after the other and take too much time when multiple dev services are involved.
We need to start them in parallel and wait until they are ready.
When services are started at augmentation time, we didn't have the problem because the build steps are called using the build executor service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! As an opposite problem, @cescoffier pointed out that some dev services might have a desired ordering, which would be honoured when they're started as part of a build chain, and not honoured when I just iterate through a collection. But we agreed that (a) people probably shouldn't expect that level of orchestration and (b) if we needed to, we could add a prioritisation mechanism later. Just replicating a priority from the Processor order might be enough.
So I'll switch to doing some kind of parallelisation, and if we need an ordered parallelisation later we can cross that bridge when we come to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, if we wanted to, we could be incremental iterate and do that as an (immediately after) follow-on work item. For now only one dev service (Redis) would actually start in the start step, and for all the others, it's a fast no-op. So we'll only see this (definitely important) problem once we start scaling this out to other dev services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we agreed that (a) people probably shouldn't expect that level of orchestration and (b) if we needed to, we could add a prioritisation mechanism later.
I would say that if they need that, they need to use the Compose Dev Services feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was sort of where Clement and I got to.
|
||
if (systemInstance == null) { | ||
try { | ||
Class s = ClassLoader.getSystemClassLoader().loadClass(DevServicesConfigTracker.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had this idea to add the state holder class to the bootstrap module and configure the QuarkusClassloaders to load it parent first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that would certainly make the tracker code itself cleaner. It has the disadvantage that to fully understand the behaviour you'd need to look in two places (the class and also the quarkus classloader). But that slight spreading of information is probably outweighed by improved robustness, less time spent on reflection, and less ugly code. I'll switch to that approach, thanks.
|
||
// This config map is what we use for deciding if a container from another profile can be re-used | ||
// TODO ideally the container properties would get put into it in a centralised way, but the RunnableDevService object doesn't get passed detailed information about the container | ||
Map config = new HashMap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a Map, can we use the input dev service config object (DevServiceConfiguration
) to register the service to the tracker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I/we need to think about this one. There's two parts.
Classloading
I don't love the map, but I was worried about classes coming across from other classloaders, so I stuck to system constructs. I may have been over-cautious, so I'll spell out the scenario and see if it makes sense.
If profile A runs first, it will create RunnableDevService A, and when start()
is called, it will start container a on port 4200,. Then profile B runs, ProcessorB will create RunnableDevServiceB, and RunnableDevServiceB will look into the tracker and find RunnableDevService A. So RDS B will just leave RDS A in the tracker. So then, when we ask for config, we'll be asking RDS A, and RunnableDevService A's config object would be from augmentation classloader A. At the moment I use a Supplier
and return a Map
so it's cross-classloader safe.
What gets included and config chaos
We have at least six kinds of config that we use in the processor. That's too many, clearly, so reducing it would be good, but we'd need to be careful of side effects if we overload one kind to be another kind. Something definitely needs to be done, if only naming all the kinds to make it less confusing what's going on.
Walking through it for Redis, we have:
DevServicesConfig
- passed in to the processor, generalised config saying if dev services are enabled at all, and whether to use shared networksRedisBuildTimeConfig
- aggregator, allows multiple dev services; not sure how widespread this pattern isRedisClientBuildTimeConfig
- mulitiples are aggregated inRedisBuildTimeConfig
DevServicesConfig
- group withinRedisClientBuildTimeConfig
, inspected and used to call RDS constructorcurrentDevServicesConfiguration
, Local map used for RDS creation; it's a manual hybrid of elements fromRedisBuildTimeConfig
andRedisClientBuildTimeConfig
- Uniqueness identiying config -
config
(line 173), new in this PR. This is a subset of properties, like image name and hardcoded port, that we use to decide if dev services are compatible. The old model did something similar but on a different line. This is also what's used in the RDS constructor. - DynamicConfig. This is new. It's config that we want to declare that we can provide, but we want to provide it lazily. So the left hand side is a string, and the right hand side is a function.
So I think the upshot of all that, for your question, is 'is there anything extra we'd want to put into the DevServicesConfig
that we wouldn't want to use as a uniqueness identifier? And the opposite, is there anything we'd want in the uniqueness identifier that we wouldn't want in config? For example, we could force unique instances every time by putting a uuid into the map, although that's a bit clumsy so we might not want to design for it.
Equals workarounds
In an earlier version I did put the typed interfaces on the tracker, but I needed to have a reflective equality checker to work around the classloader issue. I decided map was nicer.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…gestion for avoiding reflection
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@ozangunalp I've done your suggestion of using the QuarkusClassLoader to force stuff onto the system classloader, and it makes the code much nicer, thank you! I haven't looked at parallel execution yet because it's a no-op in the current implementation where only the redis dev service could be in the queue. So I'd prefer to merge this and then do paralleisation immediately after, or maybe once I've converted one more service, so there are things to test with. |
Status for workflow
|
Status for workflow
|
@holly-cummins So I wrote a long comment yesterday, but I now realise it wasn't posted. Sorry for that... I verified that the parent-first classloader change works as expected. I thought we could configure different classloaders via However, there are still some things not working as expected:
I've a branch where I experimented with some stuff, I'll share it as soon as I clean it up. |
Oddly, I think GitHub emailed me, but didn't put anything on the issue itself.
Yeah, it's kind of ugly :), but it does what we want more reliably than the existing mechanisms based on classpath elements.
Oh dear. Is this continuous testing? And the badness is that instead of going "I will shut down and restart because I've changed," it goes "oh no, nothing matches, I can't start one, I'll try and find a shared one?"
Doh! Good spot.
That seems plausible. The rule for 'can I close down this service' has to change to not be anything to do with 'did I create this service?', because we can't rely on chronology. I'm tempted to deprecate the
I think your other comment mentioned continuous testing, too. I added tests, but not enough, because I didn't add any for continuous testing. :( Your other comment also mentioned that things can work by luck because the container locator finds the service. (I think that's what it meant). That seems like a nasty gotcha that could trick good tests into giving bad answers. Although that's kind of a complicated one since the redis processor invoke the container locator service first, I think.
Cool! My first instinct is to run off and debug, but I won't touch anything until then, so we don't duplicate work. |
Resolves #47602
See also #45785. This is the first of an incremental rollout of a new model, to resolve a problem created by #34681. Doing the change incrementally keeps PRs small and allows feedback on the approach. This also has the advantage of validating that existing ecosystem extensions will continue to work. “Work” needs qualification, since they only work if ports are distinct, which brings us back to #45785. In the existing dev services, assumption that the order of the processors running correlates to the order of the dev services being used (like
first
static variables) is problematic.The design
There are two elements to the design:
ConfigSource
to lazily share about ephemeral ports, etcMost dev service processors check to see if a service is already running, but using static variables, so the check only finds services within the same profile. This isn’t enough post-#34681, because another profile might have started dev services on the same port. Those dev services would ideally be re-used, and if they’re not re-used, they need to be closed. This means we need to track dev services between augmentation classloaders. To do this, I’ve created a tracker build item which tracks across classloaders. I would have liked to inject unique instances into processors but doing that would need a
KeyedBuildItem
orFeatureBuildItem
or some other basic type which got created with state about what it’s getting injected into.Re-use semantics
The logic for starting dev services is a bit more involved than it used to be, but it’s centralised. It's also more capable. There's better ability to shut down dev services from other profiles without relying on tracking owners, and better ability (but ultimately controlled by dev service processor) to reuse dev services.
When start gets called on a dev services build item, it first checks to see if there’s a running dev service with the right properties. (This could be from a previous continuous testing restart, or another profile. If there’s a match, no further action is needed. If there’s not a match, we need to stop other dev services of the same kind. The tracker helps us find them, because we can’t assume that dev services are started in the same order that augmentation was run. Once all the other containers are stopped, the new container is started and submitted for tracking.
(https://excalidraw.com/#json=VvSzzPO1nVD_XdPElaHsF,EHrqqwDzb5H5g5lXFYUk6A)
In the old model, the Redis processor would store state in a static variable and attempt to re-use compatible
RunningDevServices
, with aboolean restartRequired = !currentDevServicesConfiguration.equals(capturedDevServicesConfiguration);
check. In the new model it does a similar check, but also checks across profile boundaries. This check is now non-optional because any extant containers/RDSes that don't get re-used need to get stopped. TheContainerLocator
logic isn't changed.A nice benefit of this approach is that the code in the processors is shorter, which gets us a bit closer to the dream of low-boilerplate dev services. The one icky boilerplate that’s new is the key-value pairs for the uniqueness/re-use criteria for the dev services. This is maybe something to think about for a follow-up PR. I think this approach might also avoid a few restarts we used to see between profiles, which could speed up tests.
Could be better (areas for discussion)
RunnableDevService
extendRunningDevService
. That seems most logical in terms of the methods (runnable dev services has more classes and methods), but it’s least logical in terms of the names. A third common superclass would solve the naming problem, but make for a lot of classes.Other approaches considered
I was originally hoping to do a less invasive approach where I just filtered out dev service build items during augmentation, and then ran a custom build with a new build chain including those items. This sort of worked, but getting the right config to the right place wouldn’t have been straightforward, and the second augmentation would have made classloader problems. The current approach means every dev service needs updating, but it does improve the processor code for the adjusted dev services, I think.