-
Notifications
You must be signed in to change notification settings - Fork 582
test pinning metapackage generated by conda-lock #7015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Conversation
This looks like a cool idea! With, I think, a lot less infrastructure needed per repo! Restating what you explained in my own words to see if I understand it: The In cuml we'd depend on a specific version (corresponding to a specific commit?) of How would the workflow look like for local development (no devcontainer)? I don't know enough about what happens when I use Does this mean we need to modify the recipes to add/remove I guess it doesn't solve the problem of frozen dependencies for things that aren't RAPIDS packages right? But I think most problems come from inside RAPIDS as the coupling with external projects is much looser. So maybe we don't need to solve that for now. |
There's been a slack thread in the swrapids-build-eng-tm channel that discusses a lot of this stuff, but I'll try to explain here also so that it's more publicly visible.
There's an example you can see at conda/conda-lock#823. The rapids-pin package will not do anything by itself. If your recipe has a dependency on cuvs and rapids-pin has cuvs pin, then your recipe will be constrained to the rapids-pin version. In other words, the pins in rapids-pin only act on the dependencies (direct and indirect) in your recipe. It is possible for these to conflict, and the answer to fixing this is likely to update your pinned rapids-pin version, or generate a new rapids-pin package to capture altered dependencies from the latest nightly packages. We'll need an escape hatch for this stuff if the rapids-pin package is blocking update workflows.
Yes, that's exactly the workflow I have in mind. There could be a bot that puts up PRs to bump rapds-pin. The version of rapids-pin is a function of at least 2 things:
There's really no code to rapids-pin, just the state of the package ecosystem and perhaps the codebase of conda-lock or whatever is creating the rapids-pin package.
I don't think this is a problem. The rapids-pin package should only be used in conda recipes, and should only ever be in build/host sections, never in run. The main change in CI runs bs releases is that we'll want to remove pins for the testing environment when cutting a release. That will be a more realistic representation of user environments. The build pins don't need to change for releases, but they will, of course, affect the output package via mechanisms such as run_exports. So we should be aware of our pins and choose consciously.
I think frozen dependencies are exactly the problem that this solves. The metapackage cuts out all RAPIDS packages, such that they are not constrained at all. Instead, the constraints are the superset of all dependencies for all projects (minus any internal RAPIDS dependencies). As such, external dependencies are pinned exactly, but RAPIDS dependencies follow any pinning ranges in the recipes. |
For example, the current rapids-pin package is failing:
This is because I generated rapids-pin with the environment spec:
which is a runtime-only environment, so things like libcurand-dev were not included. The solution here may be to augment the input environment with the build-time dependencies, or maybe create a more build-time focused environment input and a test-time focused environment input, with corresponding packages. |
What I meant is the following case: cuml and cudf both depend on the same non-RAPIDS package foobar. They both use the same version (say v42) of From reading your reply and the linked conda-lock PR, I think the answer is that foobar's version would be constrained for both cuml and cudf. We could list it as an excluded package, in which case it wouldn't be constrained in cuml or cudf. One thing I am now less clear on though: "The metapackage cuts out all RAPIDS packages, such that they are not constrained at all". This sounds like rerunning a build at two different times will get two different versions of a RAPIDS package that cuml depends on (if there were changes to the RAPIDS package cuml depends on). Even if the version of rapids-pin remains the same. Most random breakage is due to RAPIDS package changes :-/ |
Yes, but only at build (and perhaps test) time. If an arbitrary user installs the nightly package, the result will be unconstrained aside from any run dependencies as normal.
This is accurate. The reason why I added this was for recipes with multiple outputs. When I was toying with the pinning scheme for PRs, I found that the frozen value for earlier packages (say libcuml) would eventually become unavailable due to normal cleanup of old builds. At that point, you'd have to clear the lock completely and start over. I think that's true: subpackages should not be pinned. I also get your point: we need to be able to include locked references to external RAPIDS projects. If we extend the metapackage as it is, then we need a way to get it to ignore the subpackages for the current project. Another approach which is probably easier is to keep the metapackage as it is, but add another metapackage that is specific to external RAPIDS projects. This would need to be managed on a per-repo basis, or perhaps metapackages that are more componentized (rapids-pin-rmm, for example, would pin rmm for packages that consume it). Why keep the existing metapackage? Because I think there is value in having uniform constraints across all of RAPIDS as much as possible. I don't think we should go to per-repo metapackages. The componentized metapackages might help us get the best of both. For reference in case this isn't clear, we'd end up with cuml with content like:
cuml will only see updates when changing the version pins on any of these. It is possible that rapids-pin-rmm and rapids-pin-raft may conflict - in which case, we'd probably trim implicit dependencies from the project-specific rapids-pin packages. |
@msarahan Can we target branch-25.10 for this PR, please? |
I'm not sure I have much more to say at the moment :-/ Somehow the feeling of "this has many moving parts and complexity" is sneaking up on me when viewed from my POV ("I just want a fixed env to build and run tests in for the CI. It only changes when there is an explicit action."). I understand that we are quite constrained given the existing processes and infrastructure, but I can't shake the feeling that we are adding layers to deal with layers we added previously. We are making things more complex, not simpler :-/ The reason I'm so focussed on "simple" is because creating and maintaining infrastructure like this is a necessary pre-condition for building tools that solve problems real humans have, but ideally we'd spend as little time as reasonable on it (you want to spend time riding your bike, not spend time maintaining it - but you do have to maintain it). |
I rather like maintaining bicycles! But seriously, we're adding a new capability. It is ultimately going to be very similar to scikit-learn's approach, except that it treats RAPIDS as a whole greater project as one thing to unify. As such, it moves the mechanics of freezing dependencies to someplace outside of any single project, and then imposes the freezing using constraints. This is a little indirect, but you can think of it as a particular revision of an environment file in scikit-learn's approach. I agree that there isn't much more to discuss right now. I appreciate and share your concern about simplicity. I'll ping you when I have the next implementation (allowing pins of intra-rapids projects somehow). I think those might be better off using pinning in conda-build-config.yaml, but we'll see. |
This is a PoC of a scheme for pinning build-time dependencies. A RAPIDS-wide environment is created using conda-lock, which is then reformatted into a rattler-build recipe where all of the environment packages are expressed as run_constraints. When we add a dependency to this metapackage in our conda recipes, it has the effect of constraining the versions that rattler-build will allow in the host environment.
If this works, the next steps are to build a better matrix-backed pipeline to generate these pinning packages, and to then roll out these pinning package dependencies across our recipes.