[bug]: The function conflicts with existing pytorch_cuda_alloc_conf environment variable settings, leading to failure. #7731

markusph-dev · 2025-03-04T09:25:56Z

Is there an existing issue for this problem?

I have searched the existing issues

Operating system

Windows

GPU vendor

Nvidia (CUDA)

GPU model

RTX2000

GPU VRAM

12GB

Version number

5.7.2rc2

Browser

MS Edge

Python dependencies

What happened

When pytorch_cuda_alloc_conf: "backend:cudaMallocAsync" is set in invokeai.yaml and the environment variable PYTORCH_CUDA_ALLOC_CONF is already set, the following error is generated and execution halts.

raceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in run_code File "Y:\AI\InvokeAI.venv\Scripts\invokeai-web.exe_main.py", line 8, in File "Y:\AI\InvokeAI.venv\Lib\site-packages\invokeai\app\run_app.py", line 32, in run_app configure_torch_cuda_allocator(app_config.pytorch_cuda_alloc_conf, logger) File "Y:\AI\InvokeAI.venv\Lib\site-packages\invokeai\app\util\torch_cuda_allocator.py", line 14, in configure_torch_cuda_allocator raise RuntimeError( RuntimeError: Attempted to configure the PyTorch CUDA memory allocator, but PYTORCH_CUDA_ALLOC_CONF is already set to 'expandable_segments:True,max_split_size_mb:512,garbage_collection_threshold:0.8'. Process exited with code 1

What you expected to happen

.

How to reproduce the problem

Set the variable in the user profile in Windows:
PYTORCH_CUDA_ALLOC_CONF= expandable_segments:True,max_split_size_mb:512,garbage_collection_threshold:0.8

Enable in the invokeai.yaml file:
pytorch_cuda_alloc_conf: "backend:cudaMallocAsync"

Additional context

Attached is a revised function incorporating an updated priority logic.

Priority logic in the function:
The configure_torch_cuda_allocator function implements the following priority logic:

torch_cuda_allocator.py.txt

The function first checks if the PYTORCH_CUDA_ALLOC_CONF environment variable is already set.

If it is set, it has initial priority.

Comparison with desired configuration: If the environment variable exists, the function compares its value with the pytorch_cuda_alloc_conf configuration provided as an argument.

If the values match, the function assumes that the desired configuration is already in effect and continues.

Overwriting the environment variable: If the environment variable exists but its value is different from the desired configuration, the function overwrites the environment variable with the new value.

This means that the configuration provided to the function has priority over the previous environment variable configuration.

Setting if non-existent: If the environment variable does not exist, the function sets it with the desired configuration.

Discord username

MarkusPh

psychedelicious · 2025-03-04T13:02:27Z

I think we should just log a warning when we detect that there is a conflict in the settings, and not raise an error.

markusph-dev added the bug Something isn't working label Mar 4, 2025

psychedelicious mentioned this issue Mar 5, 2025

feat(app): revised configure_torch_cuda_allocator() & testing strategy #7733

Merged

4 tasks

psychedelicious closed this as completed in #7733 Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]: The function conflicts with existing pytorch_cuda_alloc_conf environment variable settings, leading to failure. #7731

[bug]: The function conflicts with existing pytorch_cuda_alloc_conf environment variable settings, leading to failure. #7731

markusph-dev commented Mar 4, 2025

psychedelicious commented Mar 4, 2025

[bug]: The function conflicts with existing pytorch_cuda_alloc_conf environment variable settings, leading to failure. #7731

[bug]: The function conflicts with existing pytorch_cuda_alloc_conf environment variable settings, leading to failure. #7731

Comments

markusph-dev commented Mar 4, 2025

Is there an existing issue for this problem?

Operating system

GPU vendor

GPU model

GPU VRAM

Version number

Browser

Python dependencies

What happened

What you expected to happen

How to reproduce the problem

Additional context

Attached is a revised function incorporating an updated priority logic.

Discord username

psychedelicious commented Mar 4, 2025