You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug]: --USE-ZLUDA uses cpu ..[ Extremly slow performance].... while '--use-directml ' works but i think didnt uses zluda [litlle better Performance] not more then 2 its for lightest model ....
#588
Open
4 of 6 tasks
Geekyboi6117 opened this issue
Mar 8, 2025
· 0 comments
The issue is caused by an extension, but I believe it is caused by a bug in the webui
The issue exists in the current version of the webui
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet
What happened?
uses cpu when said to use zluda very slow speed. directml works but not fast . saw videos in which zluda gives more then 5 its for rx6800 . while my GPU is rx6600 . i think it should give atleast more then 3 its on lightest model with 512 512 res ... also tried comfyui-zluda fork . same performance . maybe somthing wrong with rocm and zluda version versions . i there is a catch . when using comnfyui-zluda fork . it detects zluda here is the log from it .
Total VRAM 8176 MB, total RAM 16306 MB
pytorch version: 2.3.0+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6600 [ZLUDA] : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
but the gen speed isnt fast like not more then 2 here also
Steps to reproduce the problem
run user webui
genretes console log
opens
lazy gen speed
uses cpu when said to use zluda
What should have happened?
should use zluda . find rocm runtime instead of rocm home
(venv) E:\AII\sd_AMD\stable-diffusion-webui-amdgpu>webui-user.bat
venv "E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe"
WARNING: ZLUDA works best with SD.Next. Please consider migrating to SD.Next.
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)]
Version: v1.10.1-amd-24-g63895a83
Commit hash: 63895a83f70651865cc9653583c69765009489f3
ROCm: agents=['gfx1032']
ROCm: version=5.7, using agent gfx1032
ZLUDA support: experimental
Using ZLUDA in E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\.zluda
No ROCm runtime is found, using ROCM_HOME='C:\Program Files\AMD\ROCm\5.7'
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\timm\models\layers\__init__.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", FutureWarning)
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\pytorch_lightning\utilities\distributed.py:258: LightningDeprecationWarning: `pytorch_lightning.utilities.distributed.rank_zero_only` has been deprecated in v1.8.1 and will be removed in v2.0.0. You can import it from `pytorch_lightning.utilities` instead.
rank_zero_deprecation(
Launching Web UI with arguments: --use-zluda --disable-nan-check --opt-sdp-attention --medvram --no-half-vae --opt-split-attention --ckpt-dir 'E:\AII\Models' --precision full --no-half
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
ONNX failed to initialize: Failed to import diffusers.pipelines.pipeline_utils because of the following error (look up to see its traceback):
Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
cannot import name 'Cache' from 'transformers' (E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\transformers\__init__.py)
Loading weights [6ce0161689] from E:\AII\Models\v1-5-pruned-emaonly.safetensors
Creating model from config: E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\configs\v1-inference.yaml
E:\AII\sd_AMD\stable-diffusion-webui-amdgpu\venv\lib\site-packages\huggingface_hub\file_download.py:795: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Running on local URL: http://127.0.0.1:7860
To create a public link, set`share=True`in`launch()`.Startup time: 11.3s (prepare environment: 14.5s, initialize shared: 0.7s, load scripts: 0.4s, create ui: 0.4s, gradio launch: 0.7s).Applying attention optimization: Doggettx... done.Model loaded in 2.3s (load weights from disk: 0.3s, create model: 0.6s, apply weights to model: 1.1s, hijack: 0.1s, calculate empty prompt: 0.1s).txt2img: CATE:\AII\sd_AMD\stable-diffusion-webui-amdgpu\modules\safe.py:156: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for`weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True`for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. return unsafe_torch_load(filename, *args, **kwargs) 25%|███████████████████████████████████████████▌ | 5/20 [00:36<01:50, 7.35s/it]Interrupted with signal 2 in <frame at 0x000001AAC8819F70, file 'C:\\Users\\ABDULLAH\\AppData\\Local\\Programs\\Python\\Python310\\lib\\threading.py', line 324, code wait> | 5/20 [00:29<01:37, 6.48s/it]Terminate batch job (Y/N)? Y
Additional information
as console log says . slow speed when using cpu . and when i --use-directml speed gets to 2its or less . but genereally better then cpu
The text was updated successfully, but these errors were encountered:
Checklist
What happened?
uses cpu when said to use zluda very slow speed. directml works but not fast . saw videos in which zluda gives more then 5 its for rx6800 . while my GPU is rx6600 . i think it should give atleast more then 3 its on lightest model with 512 512 res ... also tried comfyui-zluda fork . same performance . maybe somthing wrong with rocm and zluda version versions . i there is a catch . when using comnfyui-zluda fork . it detects zluda here is the log from it .
----------------------ZLUDA-----------------------------
:: ZLUDA detected, disabling non-supported functions.
:: CuDNN, flash_sdp, mem_efficient_sdp disabled).
--------------------------------------------------------
:: Device : AMD Radeon RX 6600 [ZLUDA]
Total VRAM 8176 MB, total RAM 16306 MB
pytorch version: 2.3.0+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 6600 [ZLUDA] : native
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
but the gen speed isnt fast like not more then 2 here also
Steps to reproduce the problem
What should have happened?
should use zluda . find rocm runtime instead of rocm home
What browsers do you use to access the UI ?
No response
Sysinfo
sysinfo-2025-03-08-15-14.json
Console logs
Additional information
as console log says . slow speed when using cpu . and when i --use-directml speed gets to 2its or less . but genereally better then cpu
The text was updated successfully, but these errors were encountered: