Open
Description
I have a question regarding the performance of the Qwen3 model (specifically the 8B q8k_xl variant) when running on an A770 GPU.
Current Observations:
Memory Bandwidth (IMC):
IMC Read: 25,000 MiB/s
IMC Write: 50 MiB/s
Compute Utilization: Approximately 30%
CPU Core Usage: 10 out of 12 cores are at 100% utilization.
The inference speed is really slow, about 8 tokens/second. Is this an expected result?
The deepseek 0528 Model nearly uses 100% compute and only one core.
Metadata
Metadata
Assignees
Labels
No labels