Skip to content

whisper : implement batched decoding #1048

Closed
@ggerganov

Description

@ggerganov

When using beam search, we currently run the decoders sequentially:

https://github.com/ggerganov/whisper.cpp/blob/f1c9df58064e234b8bd5bd41a59530b675dd2ffe/whisper.cpp#L4416-L4444

This is multiple times slower compared to a batched evaluation. This inefficiency is the major factor preventing efficient usage of beam search in whisper.cpp and thus often resulting in bad transcription quality.

Batched inference has been demonstrated in llama.cpp:

https://github.com/ggerganov/llama.cpp/blob/bd34cdde38f8fd661890ddd5f57ca30bf279877b/examples/baby-llama/baby-llama.cpp#L768-L777

This can be a starting point for doing the same in whisper.cpp and achieving efficient beam search implementation

Metadata

Metadata

Assignees

Labels

decodingDecoding related issuesperformanceCPU and memory usage - results and comparisons

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions