Fix deduplication of virtual packages installed under aliases #6735
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's the problem this PR addresses?
Fixes #6573
In order to explain why the bug happens, let's do a refresher on virtual packages and deduplication, and how we end up here
Virtuals
Let's say we have the following packages:
pkg-z
has a peer dependency onpkg-p@*
pkg-a
has a dependency onpkg-z@^1.0.0
andpkg-p@^1.0.0
pkg-b
has a dependency onpkg-z@^1.0.0
andpkg-p@^2.0.0
When we resolve the peer dependencies, we resolve which instance of
pkg-p
eachpkg-z
instance should get, then we add that instance ofpkg-p
as a regular dependency to thepkg-z
instance in our internal package dataThis is a problem because we end up with the same package having two different sets of dependencies. To avoid that, we create virtual packages for each instance of peer requesters (i.e. packages with peer dependencies) in the tree. We also replace descriptors that resolve to those packages with virtual descriptors.
The two instance of
pkg-z
that have different dependency sets are now literally two different packages in our internals. The virtual id is based on the parent package and the pre-virtualization package.Deduplication
However, just virtualizing every peer requester naively can lead to some problems. Take this example:
We have the reverse "problem" -- we end up with two different virtual
pkg-z
with the same dependency sets. That's not incorrect per se, but it'd be nice if we can use the same package instance for both.That's why we also have a deduplication process where we find virtual descriptors that resolve to virtualizations of the same package with the same dependency set. Then we replace all those virtual descriptors and packages with a single "master" descriptor/package among them
Now we go back to having one
pkg-z
instance, and we can remove the discarded virtual descriptor/package (pkg-z@virtual:<id-b>
) from our internals.Bug with aliases
This deduplication, however, in turn leads to a bug (#1352) in some edge cases. (In this and subsequent examples, I'll omit
pkg-p
. Assume that all instances ofpkg-z
has their peer requests satisfied the same way)We have used aliases to make two different virtual descriptors resolve to the same virtual package. When deduping, we will dedupe one of the virtual descriptors to the other descriptor, and discard the package it resolves to. Now we end up with zero instances of
pkg-z
...#1726 fixes that, by having the deduplication key include the descriptor ident (
pkg-x
vspkg-y
), so those two are not deduped against each other.This is where we are currently at.
More edge cases !!!
This fix, however, once again causes more edge cases.
In one edge case, it will cause virtual packages to not dedupe when they should
#1726 fixes specifically for two aliased dependencies of the same package. What if we have two packages each using an alias? We don't dedupe those because the virtual descriptors have different idents, and end up with two virtual instances because the parent package is part of the virtual id calculation.
That case in itself is also not a problem per se, just inefficient. However, using that we can construct a situation where the same package can either dedupe or not dedupe depending on which descriptor is being checked.
Here, the second descriptor
pkg-x@virtual:<id-b>
will be deduped into the first: they have the same ident and resolve to the same "physical" package. Once again we will discardpkg-z@virtual:<id-b>
in the process.However, the third descriptor
pkg-x@virtual:<id-b>
is not deduped, leaving a dangling reference to the discarded package.This is the case reported in #6573
How did you fix it?
At its root, #1352 is caused by the same virtual package being checked for dedupe twice against two different virtual descriptors that resolve to it. What if we just... not do that? Instead of deduping virtual descriptors, just dedupe the virtual packages.
We would have virtual descriptors resolving to virtual packages with a different virtual id, but that would not cause problems. In fact, the entire point of resolution is to link descriptors and locators.
This ends up simplifying the implementation a lot because we don't need to keep track of which packages have which virtual descriptors as dependencies. We just need a reverse resolution map from virtual locators to virtual descriptors. Deduping is as simple as updating the project resolution map.
The only downside is we have a few more descriptors in the internals, but that's just a very tiny cost compared to the benefits.
Checklist