Skip to content

test pinning metapackage generated by conda-lock #7015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: branch-25.10
Choose a base branch
from

Conversation

msarahan
Copy link
Contributor

This is a PoC of a scheme for pinning build-time dependencies. A RAPIDS-wide environment is created using conda-lock, which is then reformatted into a rattler-build recipe where all of the environment packages are expressed as run_constraints. When we add a dependency to this metapackage in our conda recipes, it has the effect of constraining the versions that rattler-build will allow in the host environment.

If this works, the next steps are to build a better matrix-backed pipeline to generate these pinning packages, and to then roll out these pinning package dependencies across our recipes.

@msarahan msarahan requested review from a team as code owners July 17, 2025 15:41
@msarahan msarahan added the DO NOT MERGE Hold off on merging; see PR for details label Jul 17, 2025
@msarahan msarahan requested a review from gforsyth July 17, 2025 15:41
@msarahan msarahan added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Needs build-infra Requires input from the build infrastructure team labels Jul 17, 2025
@github-actions github-actions bot added conda conda issue ci labels Jul 17, 2025
@betatim
Copy link
Member

betatim commented Jul 18, 2025

This looks like a cool idea! With, I think, a lot less infrastructure needed per repo!

Restating what you explained in my own words to see if I understand it: The rapids-pin package will depend on, for example, some specific commit of cuvs and so by depending on rapids-pin we will get exactly that commit of cuvs.

In cuml we'd depend on a specific version (corresponding to a specific commit?) of rapids-pin. This means as long as we don't change that version we always get the same versions of all the RAPIDS packages. When we want to we can make a PR that updates the version of rapids-pin, see what works//fails, apply fixes and then merge it?

How would the workflow look like for local development (no devcontainer)? I don't know enough about what happens when I use build.sh to build locally, does that use the recipe(s) or would you need to install rapids-pin into the conda env by hand?

Does this mean we need to modify the recipes to add/remove rapids-pin for CI runs vs releases? I assume we don't want to release something that ends up depending on rapids-pin?

I guess it doesn't solve the problem of frozen dependencies for things that aren't RAPIDS packages right? But I think most problems come from inside RAPIDS as the coupling with external projects is much looser. So maybe we don't need to solve that for now.

@msarahan
Copy link
Contributor Author

There's been a slack thread in the swrapids-build-eng-tm channel that discusses a lot of this stuff, but I'll try to explain here also so that it's more publicly visible.

Restating what you explained in my own words to see if I understand it: The rapids-pin package will depend on, for example, some specific commit of cuvs and so by depending on rapids-pin we will get exactly that commit of cuvs.

There's an example you can see at conda/conda-lock#823. The rapids-pin package will not do anything by itself. If your recipe has a dependency on cuvs and rapids-pin has cuvs pin, then your recipe will be constrained to the rapids-pin version. In other words, the pins in rapids-pin only act on the dependencies (direct and indirect) in your recipe.

It is possible for these to conflict, and the answer to fixing this is likely to update your pinned rapids-pin version, or generate a new rapids-pin package to capture altered dependencies from the latest nightly packages. We'll need an escape hatch for this stuff if the rapids-pin package is blocking update workflows.

In cuml we'd depend on a specific version (corresponding to a specific commit?) of rapids-pin. This means as long as we don't change that version we always get the same versions of all the RAPIDS packages. When we want to we can make a PR that updates the version of rapids-pin, see what works//fails, apply fixes and then merge it?

Yes, that's exactly the workflow I have in mind. There could be a bot that puts up PRs to bump rapds-pin. The version of rapids-pin is a function of at least 2 things:

  • The dependencies expressed by all RAPIDS packages used to create the frozen environment. These might be nightly packages to be more responsive to changes.
  • The state of the package ecosystem when the rapids-pin package gets created.

There's really no code to rapids-pin, just the state of the package ecosystem and perhaps the codebase of conda-lock or whatever is creating the rapids-pin package.

Does this mean we need to modify the recipes to add/remove rapids-pin for CI runs vs releases? I assume we don't want to release something that ends up depending on rapids-pin?

I don't think this is a problem. The rapids-pin package should only be used in conda recipes, and should only ever be in build/host sections, never in run.

The main change in CI runs bs releases is that we'll want to remove pins for the testing environment when cutting a release. That will be a more realistic representation of user environments. The build pins don't need to change for releases, but they will, of course, affect the output package via mechanisms such as run_exports. So we should be aware of our pins and choose consciously.

I guess it doesn't solve the problem of frozen dependencies for things that aren't RAPIDS packages right? But I think most problems come from inside RAPIDS as the coupling with external projects is much looser. So maybe we don't need to solve that for now.

I think frozen dependencies are exactly the problem that this solves. The metapackage cuts out all RAPIDS packages, such that they are not constrained at all. Instead, the constraints are the superset of all dependencies for all projects (minus any internal RAPIDS dependencies). As such, external dependencies are pinned exactly, but RAPIDS dependencies follow any pinning ranges in the recipes.

@msarahan
Copy link
Contributor Author

For example, the current rapids-pin package is failing:

libcurand-dev * cannot be installed because there are no viable options:
└─ libcurand-dev 10.3.1.50 | 10.3.1.50 | 10.3.2.106 | 10.3.3.141 | 10.3.4.107 | 10.3.4.107 | 10.3.5.119 | 10.3.5.147 | 10.3.5.147 | 10.3.5.147 | 10.3.6.39 | 10.3.6.82 | 10.3.7.37 | 10.3.7.68 | 10.3.7.77 | 10.3.9.55 | 10.3.9.90 | 10.3.9.90 would require
   └─ cuda-version >=12.0,<12.1.0a0, for which no candidates were found.
The following packages are incompatible
├─ libcurand-dev * can be installed with any of the following options:
│  └─ libcurand-dev 10.3.10.19
└─ rapids-pin * cannot be installed because there are no viable options:
   └─ rapids-pin 25.8 would constrain
      └─ libcurand-dev ==10.3.10.19 h9ab20c4_0, which conflicts with any installable versions previously reported

This is because I generated rapids-pin with the environment spec:

name: rapids-25.08
channels:
  - rapidsai-nightly
  - conda-forge
  - nvidia
dependencies:
  - rapids=25.08
  - python=3.13
  - 'cuda-version>=12.0,<=12.9'
# non-standard - dependencies that should be excluded from the lockfile, so as to let them float as needed

output_recipe:
  name: rapids-pin
  version: 25.08
  build:
    string: py313_0
  exclude_patterns:
  - cudf
  - cuxfilter
  - cuvs
  - cuml
  - cugraph
  - cucim
  - ucxx
  - rapids
  - rmm
  - raft
  - ucx-py

which is a runtime-only environment, so things like libcurand-dev were not included. The solution here may be to augment the input environment with the build-time dependencies, or maybe create a more build-time focused environment input and a test-time focused environment input, with corresponding packages.

@betatim
Copy link
Member

betatim commented Jul 21, 2025

I guess it doesn't solve the problem of frozen dependencies for things that aren't RAPIDS packages right? But I think most problems come from inside RAPIDS as the coupling with external projects is much looser. So maybe we don't need to solve that for now.

I think frozen dependencies are exactly the problem that this solves. The metapackage cuts out all RAPIDS packages, such that they are not constrained at all. Instead, the constraints are the superset of all dependencies for all projects (minus any internal RAPIDS dependencies). As such, external dependencies are pinned exactly, but RAPIDS dependencies follow any pinning ranges in the recipes.

What I meant is the following case: cuml and cudf both depend on the same non-RAPIDS package foobar. They both use the same version (say v42) of rapids-pin. Does this mean they both get a specific version of foobar - a version determined when rapids-pin v42 was made. Or do they get the version of foobar that is determined when the CI job runs and has nothing to do with their use of rapids-pin?

From reading your reply and the linked conda-lock PR, I think the answer is that foobar's version would be constrained for both cuml and cudf. We could list it as an excluded package, in which case it wouldn't be constrained in cuml or cudf.

One thing I am now less clear on though: "The metapackage cuts out all RAPIDS packages, such that they are not constrained at all". This sounds like rerunning a build at two different times will get two different versions of a RAPIDS package that cuml depends on (if there were changes to the RAPIDS package cuml depends on). Even if the version of rapids-pin remains the same. Most random breakage is due to RAPIDS package changes :-/

@msarahan
Copy link
Contributor Author

msarahan commented Jul 21, 2025

From reading your reply and the linked conda-lock PR, I think the answer is that foobar's version would be constrained for both cuml and cudf. We could list it as an excluded package, in which case it wouldn't be constrained in cuml or cudf.

Yes, but only at build (and perhaps test) time. If an arbitrary user installs the nightly package, the result will be unconstrained aside from any run dependencies as normal.

One thing I am now less clear on though: "The metapackage cuts out all RAPIDS packages, such that they are not constrained at all". This sounds like rerunning a build at two different times will get two different versions of a RAPIDS package that cuml depends on (if there were changes to the RAPIDS package cuml depends on). Even if the version of rapids-pin remains the same. Most random breakage is due to RAPIDS package changes :-/

This is accurate. The reason why I added this was for recipes with multiple outputs. When I was toying with the pinning scheme for PRs, I found that the frozen value for earlier packages (say libcuml) would eventually become unavailable due to normal cleanup of old builds. At that point, you'd have to clear the lock completely and start over. I think that's true: subpackages should not be pinned. I also get your point: we need to be able to include locked references to external RAPIDS projects. If we extend the metapackage as it is, then we need a way to get it to ignore the subpackages for the current project. Another approach which is probably easier is to keep the metapackage as it is, but add another metapackage that is specific to external RAPIDS projects. This would need to be managed on a per-repo basis, or perhaps metapackages that are more componentized (rapids-pin-rmm, for example, would pin rmm for packages that consume it).

Why keep the existing metapackage? Because I think there is value in having uniform constraints across all of RAPIDS as much as possible. I don't think we should go to per-repo metapackages. The componentized metapackages might help us get the best of both.

For reference in case this isn't clear, we'd end up with cuml with content like:

requirements:
  build:
    - rapids-pin-build X.Y.Z
  host:
    - rapids-pin-host X.Y.Z
    - rapids-pin-rmm X.Y.Z
    - rapids-pin-raft X.Y.Z

test:
  requires:
    - rapids-pin-test X.Y.Z
    - rapids-pin-raft X.Y.Z
  • rapids-pin-build, rapids-pin-host, rapids-pin-test are each supersets of dependencies across projects
  • rapids-pin-rmm locks ONLY rmm
  • rapids-pin-raft locks ONLY raft and implicitly raft's dependencies

cuml will only see updates when changing the version pins on any of these. It is possible that rapids-pin-rmm and rapids-pin-raft may conflict - in which case, we'd probably trim implicit dependencies from the project-specific rapids-pin packages.

@csadorf
Copy link
Contributor

csadorf commented Jul 22, 2025

@msarahan Can we target branch-25.10 for this PR, please?

@msarahan msarahan changed the base branch from branch-25.08 to branch-25.10 July 23, 2025 03:18
@betatim
Copy link
Member

betatim commented Jul 23, 2025

I'm not sure I have much more to say at the moment :-/ Somehow the feeling of "this has many moving parts and complexity" is sneaking up on me when viewed from my POV ("I just want a fixed env to build and run tests in for the CI. It only changes when there is an explicit action."). I understand that we are quite constrained given the existing processes and infrastructure, but I can't shake the feeling that we are adding layers to deal with layers we added previously. We are making things more complex, not simpler :-/ The reason I'm so focussed on "simple" is because creating and maintaining infrastructure like this is a necessary pre-condition for building tools that solve problems real humans have, but ideally we'd spend as little time as reasonable on it (you want to spend time riding your bike, not spend time maintaining it - but you do have to maintain it).

@msarahan
Copy link
Contributor Author

I rather like maintaining bicycles! But seriously, we're adding a new capability. It is ultimately going to be very similar to scikit-learn's approach, except that it treats RAPIDS as a whole greater project as one thing to unify. As such, it moves the mechanics of freezing dependencies to someplace outside of any single project, and then imposes the freezing using constraints. This is a little indirect, but you can think of it as a particular revision of an environment file in scikit-learn's approach.

I agree that there isn't much more to discuss right now. I appreciate and share your concern about simplicity. I'll ping you when I have the next implementation (allowing pins of intra-rapids projects somehow). I think those might be better off using pinning in conda-build-config.yaml, but we'll see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda conda issue DO NOT MERGE Hold off on merging; see PR for details improvement Improvement / enhancement to an existing function Needs build-infra Requires input from the build infrastructure team non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants