whisper : implement batched decoding

When using beam search, we currently run the decoders sequentially:

https://github.com/ggerganov/whisper.cpp/blob/f1c9df58064e234b8bd5bd41a59530b675dd2ffe/whisper.cpp#L4416-L4444

This is multiple times slower compared to a batched evaluation. This inefficiency is the major factor preventing efficient usage of beam search in `whisper.cpp` and thus often resulting in bad transcription quality.

Batched inference has been demonstrated in `llama.cpp`:

https://github.com/ggerganov/llama.cpp/blob/bd34cdde38f8fd661890ddd5f57ca30bf279877b/examples/baby-llama/baby-llama.cpp#L768-L777

This can be a starting point for doing the same in `whisper.cpp` and achieving efficient beam search implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

whisper : implement batched decoding #1048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

whisper : implement batched decoding #1048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions