Skip to content

Pytorch downgrades to a lower version #6880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thepowerfuldeez opened this issue Aug 30, 2024 · 21 comments
Open

Pytorch downgrades to a lower version #6880

thepowerfuldeez opened this issue Aug 30, 2024 · 21 comments
Labels
question Asking for clarification or support

Comments

@thepowerfuldeez
Copy link

Hi! Is there functionality to ignore dependencies if they are already exist in the system python?
I am using nvcr.io/nvidia/pytorch:24.08-py3 docker image which has pytorch==2.5.0a0+872d972e41.nv24.08 installed

I followed guide from here to enable system python via UV_SYSTEM_PYTHON=1 and I have pyproject.toml that includes packages that I've put from requirements.txt and made uv lock.

But now I see that it's been resolved with torch==2.4.0 and when I run uv sync my pytorch gets downgraded to 2.4.0 which is unexpected.
I have tried to manually remove all pytorch dependencies from lockfile and run uv sync from that state, but for some reason now it suceeds with installing only 4 of 126 packages (and they are just file packages from file://... defined in pyproject.toml)
I've tried using uv sync --frozen uv sync --locked and tried creating uv sync with different variations but haven't suceeded yet. To be frank pip install -r requirements.txt would also downgrade my torch unless I make pip freeze > requirements.txt and install with -r --no-deps but this is not favorable.

I generally lack understanding of such intricacies and I've noticed that other people stumble with flash-attn installation or CUDA extensions. I wonder what's the best way to handle such conflicts?
Thank you very much for modern replacement of pip!

@thepowerfuldeez
Copy link
Author

Example of downgrading torch (and installing cuda extensions which are not needed)

> uv sync
Resolved 127 packages in 567ms
⠧ Preparing packages... (5/13)
nvidia-cufft-cu12 ------------------------------ 58.53 MB/121.64 MB
nvidia-cusolver-cu12 ------------------------------ 58.41 MB/124.16 MB
nvidia-nccl-cu12 ------------------------------ 58.49 MB/176.25 MB
nvidia-cusparse-cu12 ------------------------------ 58.55 MB/195.96 MB
nvidia-cublas-cu12 ------------------------------ 58.29 MB/410.59 MB
nvidia-cudnn-cu12 ------------------------------ 58.35 MB/664.75 MB
torch      ------------------------------ 33.03 MB/797.23 MB
^C```

@charliermarsh
Copy link
Member

This doesn't work today with uv sync or uv lock -- it won't respect existing environments. The lower-level uv pip APIs do, though. You can run uv pip install with an active virtual environment, and it will retain existing versions of PyTorch, if they're already installed.

@charliermarsh charliermarsh added the question Asking for clarification or support label Sep 1, 2024
@thepowerfuldeez
Copy link
Author

@charliermarsh how do I respect pyproject.toml and use project setup features? I thought of switching to uv as a project management tool (similar to poetry or rye if that matter). Will uv pip install -r pyproject.toml work?
What's the proposed approach for project management for ML workflows? This is highly requested imo, would glad to have clear guide :)

@charliermarsh
Copy link
Member

Yeah uv pip install -r pyproject.toml should work just fine. In general, our goal is for folks to use the "higher-level" project APIs (uv lock, uv sync, etc.). But in this case, if you're using an nvidia image that has packages pre-installed, it won't play correctly with uv lock and uv sync. So if you want to build atop that base environment, you'll need to use the "lower-level" uv pip APIs.

E.g., activate that virtual environment, then run uv pip install -r pyproject.toml.

@thepowerfuldeez
Copy link
Author

thepowerfuldeez commented Sep 3, 2024

@charliermarsh This doesn't work
I have tried

> uv venv --python-preference only-system --system-site-packages
source .venv/bin/activate
uv pip install -r pyproject.toml

It still installs torch==2.4.0 even though it's not in the dependencies of pyproject.toml

I verified that inside this venv torch is installed

@thepowerfuldeez
Copy link
Author

@charliermarsh I have tried to set

[tool.uv]
# Always install torch 2.5, regardless of whether transitive dependencies request
# a different version.
override-dependencies = ["torch==2.5.0a0+872d972e41.nv24.08"]

but it can't be resolved

× No solution found when resolving dependencies:
  ╰─▶ Because there is no version of torch==2.5.0a0+872d972e41.nv24.8 and accelerate==0.33.0 depends on torch==2.5.0a0+872d972e41.nv24.8, we can conclude that accelerate==0.33.0 cannot be used.

@charliermarsh
Copy link
Member

@thepowerfuldeez -- If you're trying to use packages from the system Python, you'll need to use uv pip install --system -r pyproject.toml, and avoid creating a virtual environment at all. Can you try that?

@thepowerfuldeez
Copy link
Author

@charliermarsh Thank you so much! It works now! So the solution is not using lockfile for now, right?

@charliermarsh
Copy link
Member

Unfortunately yes. We'll need to think on how to solve this properly.

@charliermarsh
Copy link
Member

I'm gonna leave the issue open but might tweak the title and add some more details on the underlying problem, if that's ok.

@awoimbee
Copy link

awoimbee commented Sep 4, 2024

I had the same issue with poetry some time ago: python-poetry/poetry#6035 (resolved by python-poetry/poetry#8359).

Note uv venv --system-site-packages exists since #2101, but "we won't take the system site packages into account in subsequent commands".

@zanieb
Copy link
Member

zanieb commented Oct 21, 2024

@charliermarsh it seems like there might be a real issue to track here?

@charliermarsh
Copy link
Member

Yeah. The system packages aren't taken into account.

@zanieb
Copy link
Member

zanieb commented Jan 7, 2025

I think this is related to

Do we need to re-triage this or can we close in favor of the other issue?

@charliermarsh
Copy link
Member

I think it's a bit different... We don't even provide a way for users to tell us to look at already-installed packages, much less system packages. Whereas #4466 is about respecting system-site-packags.

@zanieb
Copy link
Member

zanieb commented Jan 7, 2025

@charliermarsh Could you open an issue that describes what change would be needed for this issue? (even if it's brief)

@thepowerfuldeez
Copy link
Author

@zanieb would be great to have a solution for using existing docker images from nvidia (as it contains cuda, cudnn, flash_attn, triton, bitsandbytes, nccl, latest pytorch where some of such libraries need to be compiled from source) and still using lockfile with high level uv sync interface that would install dependencies on top of the existing system packages.
Something like UV_SYSTEM_PYTHON=1 uv sync --respect-system-packages would work

@mlgill
Copy link

mlgill commented Jan 9, 2025

@zanieb +1 to the suggestion by @thepowerfuldeez. Multiple software teams that I work with have this problem with NVIDIA containers, in particular the PyTorch one. As @thepowerfuldeez notes, these containers contain libraries that are compiled from source to ensure they are optimized for GPUs. Thus, installing a pip version of the same library should be avoided.

@zanieb
Copy link
Member

zanieb commented Jan 9, 2025

Yeah we want to solve that problem, but it's not trivial.

@pstjohn
Copy link

pstjohn commented Jan 11, 2025

Just to chime in with a specific use case, we use uv pip install in our nvcr.io/nvidia/pytorch derived image (link), but it would be great to be able to do something similar to https://docs.astral.sh/uv/guides/integration/docker/#intermediate-layers, where we could create a uv.lock file (maybe excluding the system packages), and then run uv sync --frozen --no-install-workspace in our docker image to improve caching.

@awoimbee
Copy link

Just to help ppl working with containers, here's the solution I found some time ago:
I use use FROM docker.io/pytorch/pytorch and I just give the conda env to uv via UV_PROJECT_ENVIRONMENT=/opt/conda.
uv has its venv with torch so it's happy and doesn't reinstall it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Asking for clarification or support
Projects
None yet
Development

No branches or pull requests

6 participants