Guided mode for the "command" example #271

ggerganov · 2022-12-13T17:58:26Z

"Guided mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.

Initial tests show that this approach might be extremely efficient in terms of performance, since it integrates very well with the "partial Encoder" idea from #137.

command-guided-0.mp4

Tokenizes a string into a list of vocabulary tokens

jmhmd · 2023-01-23T19:55:02Z

With regard to #235, would this work with a large list of allowed words, i.e. a constrained vocabulary of potentially thousands of words? I realize that is not the intended use case here.

RndyP · 2023-01-23T20:55:06Z

This is exactly what I need. My application is using Whisper for 2 things basically. 1) Command mode - listen for voice input, and when detected read in up to 3 tokens and error correct, parse, and send command messages to the app. 2) - Streaming mode - when the app puts focus on a text edit control, stream Whisper text into the edit control.

The issue I have is that Whisper is a sledgehammer for simple commands, in that I just need it to infer single words, but accurately. There are certain words that Whisper has extreme trouble with, like "fur" and "crib". Whisper has a huge vocabulary (too big) and it comes up with everything under the sun (scrabble words too?), except "crib". So I created a "correction dictionary" that fixes up the error inferences. A vector of pairs of a "correct string" and a vector of "error" strings. For example:

m_Dictionary={
	
	{{"crib"},
	{"cribb","grid","cred","could","quib","cribber","cribba","criba","quid","queb","quebb","quab","craig"
	,"club","query","clear","kreb","quim","qubit"}},

.
.
.

Yuk.
This feature would be great!

ggerganov · 2023-01-24T20:26:31Z

@jmhmd @RndyP
Why don't you simply try the existing guided mode of the command example - you just need to put the words that you are interested in the commands.txt file. The accuracy of the proposed algorithm can be further improved, but I think even the existing version should work pretty well.

If you have a modern CPU, use the following command:

make command
./command -m ./models/ggml-base.en.bin -cmd ./examples/command/commands.txt

If not, you can try this one for better performance, but worse quality:

make command
./command -m ./models/ggml-tiny.en.bin -cmd ./examples/command/commands.txt -ac 128 -t 3

RndyP · 2023-01-24T20:34:05Z

I just did. See #190. Sorry to switch threads.

whisper : add whisper_tokenize()

dfa0ecd

Tokenizes a string into a list of vocabulary tokens

ggerganov force-pushed the command branch from 2446a00 to c6e7e61 Compare December 13, 2022 18:30

command : adding guided mode

a2f8071

ggerganov force-pushed the command branch from c6e7e61 to a2f8071 Compare December 13, 2022 19:22

ggerganov added 2 commits December 13, 2022 21:36

command : update README, show how to use guided mode

0d4038f

command : better indentation

03d9bb1

ggerganov marked this pull request as ready for review December 13, 2022 20:19

ggerganov merged commit 4312995 into master Dec 16, 2022

ggerganov deleted the command branch December 16, 2022 17:38

pprobst mentioned this pull request Mar 21, 2024

Using hotwords to "bias" transcription (or limit the vocabulary in some way) #1979

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guided mode for the "command" example #271

Guided mode for the "command" example #271

ggerganov commented Dec 13, 2022 •

edited

Loading

jmhmd commented Jan 23, 2023

RndyP commented Jan 23, 2023

ggerganov commented Jan 24, 2023

RndyP commented Jan 24, 2023 •

edited

Loading

Guided mode for the "command" example #271

Guided mode for the "command" example #271

Conversation

ggerganov commented Dec 13, 2022 • edited Loading

jmhmd commented Jan 23, 2023

RndyP commented Jan 23, 2023

ggerganov commented Jan 24, 2023

RndyP commented Jan 24, 2023 • edited Loading

ggerganov commented Dec 13, 2022 •

edited

Loading

RndyP commented Jan 24, 2023 •

edited

Loading