Skip to content

[bug]: The function conflicts with existing pytorch_cuda_alloc_conf environment variable settings, leading to failure. #7731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
markusph-dev opened this issue Mar 4, 2025 · 1 comment · Fixed by #7733
Labels
bug Something isn't working

Comments

@markusph-dev
Copy link

Is there an existing issue for this problem?

  • I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

RTX2000

GPU VRAM

12GB

Version number

5.7.2rc2

Browser

MS Edge

Python dependencies

What happened

When pytorch_cuda_alloc_conf: "backend:cudaMallocAsync" is set in invokeai.yaml and the environment variable PYTORCH_CUDA_ALLOC_CONF is already set, the following error is generated and execution halts.

raceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in run_code File "Y:\AI\InvokeAI.venv\Scripts\invokeai-web.exe_main.py", line 8, in File "Y:\AI\InvokeAI.venv\Lib\site-packages\invokeai\app\run_app.py", line 32, in run_app configure_torch_cuda_allocator(app_config.pytorch_cuda_alloc_conf, logger) File "Y:\AI\InvokeAI.venv\Lib\site-packages\invokeai\app\util\torch_cuda_allocator.py", line 14, in configure_torch_cuda_allocator raise RuntimeError( RuntimeError: Attempted to configure the PyTorch CUDA memory allocator, but PYTORCH_CUDA_ALLOC_CONF is already set to 'expandable_segments:True,max_split_size_mb:512,garbage_collection_threshold:0.8'. Process exited with code 1

What you expected to happen

.

How to reproduce the problem

Set the variable in the user profile in Windows:
PYTORCH_CUDA_ALLOC_CONF= expandable_segments:True,max_split_size_mb:512,garbage_collection_threshold:0.8

Enable in the invokeai.yaml file:
pytorch_cuda_alloc_conf: "backend:cudaMallocAsync"

Additional context

Attached is a revised function incorporating an updated priority logic.

Priority logic in the function:
The configure_torch_cuda_allocator function implements the following priority logic:

torch_cuda_allocator.py.txt

The function first checks if the PYTORCH_CUDA_ALLOC_CONF environment variable is already set.

If it is set, it has initial priority.

  • Comparison with desired configuration: If the environment variable exists, the function compares its value with the pytorch_cuda_alloc_conf configuration provided as an argument.

If the values match, the function assumes that the desired configuration is already in effect and continues.

  • Overwriting the environment variable: If the environment variable exists but its value is different from the desired configuration, the function overwrites the environment variable with the new value.

This means that the configuration provided to the function has priority over the previous environment variable configuration.

  • Setting if non-existent: If the environment variable does not exist, the function sets it with the desired configuration.

Discord username

MarkusPh

@markusph-dev markusph-dev added the bug Something isn't working label Mar 4, 2025
@psychedelicious
Copy link
Collaborator

I think we should just log a warning when we detect that there is a conflict in the settings, and not raise an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants