CANN: Add support for async operator submission #12864

hipudding · 2025-04-10T07:52:31Z

Submit operators using asynchronous threads to improve performance.

Use the environment variable GGML_CANN_ASYNC_MODE to control whether
asynchronous submission is enabled. It is disabled by default.

Testing shows a 10%–20% performance improvement in scenarios with
small parameter sizes, especially in quantized models.

SYNC_MODE

llama_perf_sampler_print:    sampling time =      76.81 ms /   316 runs   (    0.24 ms per token,  4113.94 tokens per second)
llama_perf_context_print:        load time =    2880.65 ms
llama_perf_context_print: prompt eval time =      23.05 ms /    27 tokens (    0.85 ms per token,  1171.11 tokens per second)
llama_perf_context_print:        eval time =    6727.99 ms /   288 runs   (   23.36 ms per token,    42.81 tokens per second)
llama_perf_context_print:       total time =    7838.36 ms /   315 tokens

ASYNC_MODE

llama_perf_sampler_print:    sampling time =      51.17 ms /   220 runs   (    0.23 ms per token,  4299.73 tokens per second)
llama_perf_context_print:        load time =    2751.20 ms
llama_perf_context_print: prompt eval time =      17.26 ms /    27 tokens (    0.64 ms per token,  1563.95 tokens per second)
llama_perf_context_print:        eval time =    3037.53 ms /   192 runs   (   15.82 ms per token,    63.21 tokens per second)
llama_perf_context_print:       total time =    3343.86 ms /   219 tokens

ggml/src/ggml-cann/aclnn_ops.h

Submit operators using asynchronous threads to improve performance. Use the environment variable GGML_CANN_ASYNC_MODE to control whether asynchronous submission is enabled. It is disabled by default. Testing shows a 10%–20% performance improvement in scenarios with small parameter sizes, especially in quantized models.

hipudding · 2025-04-15T03:29:33Z

ggml/src/ggml-cann/common.h

+#include <thread>
+#include <unistd.h>
+#include <functional>
+#include <deque>


unused header file.

noemotiovon

This PR improves NPU utilization through asynchronous dispatching — impressive work!

noemotiovon · 2025-04-15T06:56:55Z

ggml/src/ggml-cann/common.h

+
+        if (!running_) {
+            thread_ = std::thread(&cann_task_queue::execute, this);
+            running_ = true;


Is there a potential multithreading concurrency issue here?

Yes, fixed.

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Apr 10, 2025

hipudding self-assigned this Apr 11, 2025

hipudding added the Ascend NPU issues specific to Ascend NPUs label Apr 11, 2025

hipudding commented Apr 11, 2025

View reviewed changes

ggml/src/ggml-cann/aclnn_ops.h Outdated Show resolved Hide resolved

hipudding force-pushed the aync_submit branch from 0434f7f to 7c341a8 Compare April 15, 2025 03:22

hipudding changed the title ~~CANN: add async task submit~~ CANN: Add support for async operator submission Apr 15, 2025

hipudding marked this pull request as ready for review April 15, 2025 03:22

hipudding commented Apr 15, 2025

View reviewed changes

noemotiovon reviewed Apr 15, 2025

View reviewed changes

hipudding added 2 commits April 15, 2025 10:38

Use thread fence instead of atomic

42ee0cb

remove trailing whitespace

ba5c5ab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN: Add support for async operator submission #12864

CANN: Add support for async operator submission #12864

hipudding commented Apr 10, 2025 •

edited

Loading

hipudding Apr 15, 2025

noemotiovon left a comment

noemotiovon Apr 15, 2025

hipudding Apr 15, 2025

CANN: Add support for async operator submission #12864

Are you sure you want to change the base?

CANN: Add support for async operator submission #12864

Conversation

hipudding commented Apr 10, 2025 • edited Loading

hipudding Apr 15, 2025

Choose a reason for hiding this comment

noemotiovon left a comment

Choose a reason for hiding this comment

noemotiovon Apr 15, 2025

Choose a reason for hiding this comment

hipudding Apr 15, 2025

Choose a reason for hiding this comment

hipudding commented Apr 10, 2025 •

edited

Loading