Skip to content

whisper : fix "bench-all outputs an invalid result on larger models" #3002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 4, 2025

Conversation

fujimotos
Copy link
Contributor

When I run scripts/bench-all.sh on AWS c8g.xlarge, it outputs
an invalid number ("ms") in the result table.

Look at the 'Encode' column in the benchmark result below:

Running bench-all.sh (commit 6e7629b)

$ ./scripts/bench-all.sh 4 
...
|    CPU |     OS |           Config |         Model |  Th |  FA |    Enc. |    Dec. |    Bch5 |      PP |  Commit |
|    --- |    --- |              --- |           --- | --- | --- |     --- |     --- |     --- |     --- |     --- |
| <todo> | <todo> |             NEON |          tiny |   4 |   0 |  389.04 |    1.17 |    0.68 |    0.55 | 6e7629b |
| <todo> | <todo> |             NEON |          base |   4 |   0 |  879.42 |    2.05 |    1.22 |    0.98 | 6e7629b |
| <todo> | <todo> |             NEON |         small |   4 |   0 | 3290.80 |    5.45 |    3.36 |    2.82 | 6e7629b |
| <todo> | <todo> |             NEON |        medium |   4 |   0 |      ms |   14.85 |    9.51 |    8.05 | 6e7629b |
| <todo> | <todo> |             NEON |      large-v2 |   4 |   0 |      ms |   28.67 |   17.79 |   15.08 | 6e7629b |
| <todo> | <todo> |             NEON | large-v3-turbo |   4 |   0 |      ms |    4.95 |    3.15 |    2.70 | 6e7629b |

The reason is that the benchmark script assumes that the 11th field is
a timestamp
, but this assumption can break when the target model
takes a longer time to process.

This is a trivial fix for the issue, adding an explicit whitespace before
the timestamp field.

Running bench-all.sh (commit a7cf427)

$ ./scripts/bench-all.sh 4 
...
|    CPU |     OS |           Config |         Model |  Th |  FA |    Enc. |    Dec. |    Bch5 |      PP |  Commit |
|    --- |    --- |              --- |           --- | --- | --- |     --- |     --- |     --- |     --- |     --- |
| <todo> | <todo> |             NEON |          tiny |   4 |   0 |  389.81 |    1.22 |    0.70 |    0.56 | a7cf427 |
| <todo> | <todo> |             NEON |          base |   4 |   0 |  883.28 |    2.12 |    1.25 |    0.99 | a7cf427 |
| <todo> | <todo> |             NEON |         small |   4 |   0 | 3302.36 |    5.61 |    3.43 |    2.86 | a7cf427 |
| <todo> | <todo> |             NEON |        medium |   4 |   0 | 10561.90 |   15.42 |    9.71 |    8.14 | a7cf427 |
| <todo> | <todo> |             NEON |      large-v2 |   4 |   0 | 20608.38 |   29.33 |   18.36 |   15.26 | a7cf427 |
| <todo> | <todo> |             NEON | large-v3-turbo |   4 |   0 | 18801.69 |    5.10 |    3.27 |    2.73 | a7cf427 |

The benchmark script 'scripts/bench-all.sh' assumes that the 11th
field of the output line is a timestamp. This assumption does not
hold when the target model takes a bit longer to process.

Fix this issue by introducing an explicit whitespace to the output
lines of `whisper_print_timings()`.

Signed-off-by: Fujimoto Seiji <[email protected]>
@fujimotos
Copy link
Contributor Author

Note: I needed this fix to compute the benchmark result in #89 (comment).

To illustrate the point, this PR changes the following raw output:

$ ./build/bin/whisper-bench -m ./models/ggml-large-v2.bin -t 4
...
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time = 20320.76 ms /     1 runs (20320.76 ms per run)
whisper_print_timings:   decode time =  7299.80 ms /   256 runs (   28.51 ms per run)
whisper_print_timings:   batchd time =  5664.58 ms /   320 runs (   17.70 ms per run)
whisper_print_timings:   prompt time = 61682.86 ms /  4096 runs (   15.06 ms per run)

... to this:

$ ./build/bin/whisper-bench -m ./models/ggml-large-v2.bin -t 4
...
whisper_print_timings:   sample time =     0.00 ms /     1 runs (     0.00 ms per run)
whisper_print_timings:   encode time = 20625.59 ms /     1 runs ( 20625.59 ms per run)
whisper_print_timings:   decode time =  7580.05 ms /   256 runs (    29.61 ms per run)
whisper_print_timings:   batchd time =  5967.29 ms /   320 runs (    18.65 ms per run)
whisper_print_timings:   prompt time = 62751.71 ms /  4096 runs (    15.32 ms per run)

... which ensures that awk '{print $11}' always works.

@ggerganov ggerganov merged commit e6234cd into ggml-org:master Apr 4, 2025
@fujimotos fujimotos deleted the sf/bench-all branch April 5, 2025 10:18
fujimotos added a commit to fujimotos/whisper.cpp that referenced this pull request Apr 20, 2025
…gml-org#3002)

The benchmark script 'scripts/bench-all.sh' assumes that the 11th
field of the output line is a timestamp. This assumption does not
hold when the target model takes a bit longer to process.

Fix this issue by introducing an explicit whitespace to the output
lines of `whisper_print_timings()`.

Signed-off-by: Fujimoto Seiji <[email protected]>
bygreencn added a commit to bygreencn/whisper.cpp that referenced this pull request Jun 29, 2025
* ggerganov/master: (25 commits)
  examples : add HEAPU8 to exported runtime methods (ggml-org#3062)
  ruby : make Ruby bindings installed with build options (ggml-org#3056)
  whisper : add no_context parameter to whisper_params (ggml-org#3045)
  examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038)
  ruby: use CMake in build process (ggml-org#3043)
  docs : update README.md to note newer nvidia gpus (ggml-org#3031)
  addon.node : support max_context api for addon.node (ggml-org#3025)
  whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028)
  docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029)
  docs : fix README.md (ggml-org#3024)
  xcf : use check for visionos build version (ggml-org#3021)
  ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022)
  ruby : Update uri.rb (ggml-org#3016)
  models : fix dead link to models in readme (ggml-org#3006)
  ruby : change homepage URI in Ruby gemspec (ggml-org#3007)
  tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999)
  whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002)
  rename : ggerganov -> ggml-org (ggml-org#3005)
  examples : update server.py to match github pages app [no ci] (ggml-org#3004)
  whisper.wasm : fix unknown language issue (ggml-org#3000)
  ...
bygreencn added a commit to bygreencn/whisper.cpp that referenced this pull request Jun 29, 2025
* ggerganov/master: (25 commits)
  examples : add HEAPU8 to exported runtime methods (ggml-org#3062)
  ruby : make Ruby bindings installed with build options (ggml-org#3056)
  whisper : add no_context parameter to whisper_params (ggml-org#3045)
  examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038)
  ruby: use CMake in build process (ggml-org#3043)
  docs : update README.md to note newer nvidia gpus (ggml-org#3031)
  addon.node : support max_context api for addon.node (ggml-org#3025)
  whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028)
  docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029)
  docs : fix README.md (ggml-org#3024)
  xcf : use check for visionos build version (ggml-org#3021)
  ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022)
  ruby : Update uri.rb (ggml-org#3016)
  models : fix dead link to models in readme (ggml-org#3006)
  ruby : change homepage URI in Ruby gemspec (ggml-org#3007)
  tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999)
  whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002)
  rename : ggerganov -> ggml-org (ggml-org#3005)
  examples : update server.py to match github pages app [no ci] (ggml-org#3004)
  whisper.wasm : fix unknown language issue (ggml-org#3000)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants