Skip to content

Guided mode for the "command" example #271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 16, 2022
Merged

Guided mode for the "command" example #271

merged 4 commits into from
Dec 16, 2022

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Dec 13, 2022

ref #190 #235

"Guided mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.

Initial tests show that this approach might be extremely efficient in terms of performance, since it integrates very well with the "partial Encoder" idea from #137.

command-guided-0.mp4

Tokenizes a string into a list of vocabulary tokens
@ggerganov ggerganov marked this pull request as ready for review December 13, 2022 20:19
@ggerganov ggerganov merged commit 4312995 into master Dec 16, 2022
@ggerganov ggerganov deleted the command branch December 16, 2022 17:38
@jmhmd
Copy link

jmhmd commented Jan 23, 2023

With regard to #235, would this work with a large list of allowed words, i.e. a constrained vocabulary of potentially thousands of words? I realize that is not the intended use case here.

@RndyP
Copy link

RndyP commented Jan 23, 2023

This is exactly what I need. My application is using Whisper for 2 things basically. 1) Command mode - listen for voice input, and when detected read in up to 3 tokens and error correct, parse, and send command messages to the app. 2) - Streaming mode - when the app puts focus on a text edit control, stream Whisper text into the edit control.

The issue I have is that Whisper is a sledgehammer for simple commands, in that I just need it to infer single words, but accurately. There are certain words that Whisper has extreme trouble with, like "fur" and "crib". Whisper has a huge vocabulary (too big) and it comes up with everything under the sun (scrabble words too?), except "crib". So I created a "correction dictionary" that fixes up the error inferences. A vector of pairs of a "correct string" and a vector of "error" strings. For example:

m_Dictionary={
	
	{{"crib"},
	{"cribb","grid","cred","could","quib","cribber","cribba","criba","quid","queb","quebb","quab","craig"
	,"club","query","clear","kreb","quim","qubit"}},

.
.
.

Yuk.
This feature would be great!

@ggerganov
Copy link
Member Author

@jmhmd @RndyP
Why don't you simply try the existing guided mode of the command example - you just need to put the words that you are interested in the commands.txt file. The accuracy of the proposed algorithm can be further improved, but I think even the existing version should work pretty well.

If you have a modern CPU, use the following command:

make command
./command -m ./models/ggml-base.en.bin -cmd ./examples/command/commands.txt

If not, you can try this one for better performance, but worse quality:

make command
./command -m ./models/ggml-tiny.en.bin -cmd ./examples/command/commands.txt -ac 128 -t 3

@RndyP
Copy link

RndyP commented Jan 24, 2023

I just did. See #190. Sorry to switch threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants