Skip to content

The run efficiency of main target built based on CMakeList and main built based on Makefile differ greatly #440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chenqianhe opened this issue Jan 24, 2023 · 3 comments

Comments

@chenqianhe
Copy link
Contributor

main built based on make run

whisper_init_from_file: loading model from 'ggml-large.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 5
whisper_model_load: mem required  = 4641.00 MB (+   71.00 MB per decoder)
whisper_model_load: kv self size  =   70.00 MB
whisper_model_load: kv cross size =  234.38 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2950.97 MB
whisper_model_load: model size    = 2950.66 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: processing 'samples_jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = zh, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]  And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:     load time =  2146.14 ms
whisper_print_timings:      mel time =    19.26 ms
whisper_print_timings:   sample time =    11.52 ms /    27 runs (    0.43 ms per run)
whisper_print_timings:   encode time =  6265.27 ms /     1 runs ( 6265.27 ms per run)
whisper_print_timings:   decode time =  1646.73 ms /    27 runs (   60.99 ms per run)
whisper_print_timings:    total time = 10142.16 ms

main built based on cmake run

whisper_init_from_file: loading model from '../../ggml-large.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 5
whisper_model_load: mem required  = 4641.00 MB (+   71.00 MB per decoder)
whisper_model_load: kv self size  =   70.00 MB
whisper_model_load: kv cross size =  234.38 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     = 2950.97 MB
whisper_model_load: model size    = 2950.66 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: processing '../../samples_jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = zh, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:11.000]  And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.


whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:     load time = 27118.13 ms
whisper_print_timings:      mel time =   107.82 ms
whisper_print_timings:   sample time =    46.74 ms /    27 runs (    1.73 ms per run)
whisper_print_timings:   encode time = 63709.25 ms /     1 runs (63709.25 ms per run)
whisper_print_timings:   decode time =  8359.73 ms /    27 runs (  309.62 ms per run)
whisper_print_timings:    total time = 99418.73 ms

I used the same device to get the above results.

I wonder why cmake's main is much slower. Is there something wrong with me

@chenqianhe
Copy link
Contributor Author

My goal is to optimize #260 ; I am implementing the addon of node, which can call the Whisper inference implemented by cpp. But it depends on cmake.

This is another case of addon that I have implemented. It is crucial that I can complete CMakeList.txt

@ggerganov
Copy link
Member

Most likely you have CMake build in Debug.
Try the following:

rm CMakeCache.txt
cmake -DCMAKE_BUILD_TYPE=Release ../../
make

@chenqianhe
Copy link
Contributor Author

Most likely you have CMake build in Debug. Try the following:

rm CMakeCache.txt
cmake -DCMAKE_BUILD_TYPE=Release ../../
make

Thank you very much!
It is really caused by this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants