Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: --USE-ZLUDA uses cpu ..[ Extremly slow performance].... while '--use-directml ' works but i think didnt uses zluda [litlle better Performance] not more then 2 its for lightest model .... #588

Open
4 of 6 tasks
Geekyboi6117 opened this issue Mar 8, 2025 · 0 comments

Comments

@Geekyboi6117
Copy link

Checklist

  • The issue exists after disabling all extensions
  • The issue exists on a clean installation of webui
  • The issue is caused by an extension, but I believe it is caused by a bug in the webui
  • The issue exists in the current version of the webui
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

uses cpu when said to use zluda very slow speed. directml works but not fast . saw videos in which zluda gives more then 5 its for rx6800 . while my GPU is rx6600 . i think it should give atleast more then 3 its on lightest model with 512 512 res ... also tried comfyui-zluda fork . same performance . maybe somthing wrong with rocm and zluda version versions . i there is a catch . when using comnfyui-zluda fork . it detects zluda here is the log from it .

----------------------ZLUDA-----------------------------
:: ZLUDA detected, disabling non-supported functions.
:: CuDNN, flash_sdp, mem_efficient_sdp disabled).
--------------------------------------------------------
:: Device : AMD Radeon RX 6600 [ZLUDA]

Total VRAM 8176 MB, total RAM 16306 MB
pytorch version: 2.3.0+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6600 [ZLUDA] : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention


but the gen speed isnt fast like not more then 2 here also

Steps to reproduce the problem

  1. run user webui
  2. genretes console log
  3. opens
  4. lazy gen speed
  5. uses cpu when said to use zluda

What should have happened?

should use zluda . find rocm runtime instead of rocm home

What browsers do you use to access the UI ?

No response

Sysinfo

sysinfo-2025-03-08-15-14.json

Console logs

(venv) E:\AII\sd_AMD\stable-diffusion-webui-amdgpu>webui-user.bat
venv "E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-24-g63895a83
Commit hash: 63895a83f70651865cc9653583c69765009489f3
ROCm: agents=['gfx1032']
ROCm: version=5.7, using agent gfx1032
ZLUDA support: experimental
Using ZLUDA in E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\.zluda
No ROCm runtime is found, using ROCM_HOME='C:\Program Files\AMD\ROCm\5.7'
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
  rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --disable-nan-check --opt-sdp-attention --medvram --no-half-vae --opt-split-attention --ckpt-dir 'E:\AII\Models' --precision full --no-half
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
ONNX failed to initialize: Failed to import diffusers.pipelines.pipeline_utils because of the following error (look up to see its traceback):
Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
cannot import name 'Cache' from 'transformers' (E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\__init__.py)
Loading weights [6ce0161689] from E:\AII\Models\v1-5-pruned-emaonly.safetensors
Creating model from config: E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 11.3s (prepare environment: 14.5s, initialize shared: 0.7s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.7s).
Applying attention optimization: Doggettx... done.
Model loaded in 2.3s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 1.1s, hijack: 0.1s, calculate empty prompt: 0.1s).

txt2img: CAT
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\modules\safe.py:156: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  return unsafe_torch_load(filename, *args, **kwargs)
 25%|███████████████████████████████████████████▌                                                                                                                                  | 5/20 [00:36<01:50,  7.35s/it]Interrupted with signal 2 in <frame at 0x000001AAC8819F70, file 'C:\\Users\\ABDULLAH\\AppData\\Local\\Programs\\Python\\Python310\\lib\\threading.py', line 324, code wait>         | 5/20 [00:29<01:37,  6.48s/it]
Terminate batch job (Y/N)? Y

Additional information

as console log says . slow speed when using cpu . and when i --use-directml speed gets to 2its or less . but genereally better then cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant