-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Guided mode for the "command" example #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tokenizes a string into a list of vocabulary tokens
With regard to #235, would this work with a large list of allowed words, i.e. a constrained vocabulary of potentially thousands of words? I realize that is not the intended use case here. |
This is exactly what I need. My application is using Whisper for 2 things basically. 1) Command mode - listen for voice input, and when detected read in up to 3 tokens and error correct, parse, and send command messages to the app. 2) - Streaming mode - when the app puts focus on a text edit control, stream Whisper text into the edit control. The issue I have is that Whisper is a sledgehammer for simple commands, in that I just need it to infer single words, but accurately. There are certain words that Whisper has extreme trouble with, like "fur" and "crib". Whisper has a huge vocabulary (too big) and it comes up with everything under the sun (scrabble words too?), except "crib". So I created a "correction dictionary" that fixes up the error inferences. A vector of pairs of a "correct string" and a vector of "error" strings. For example:
. Yuk. |
@jmhmd @RndyP If you have a modern CPU, use the following command:
If not, you can try this one for better performance, but worse quality:
|
I just did. See #190. Sorry to switch threads. |
ref #190 #235
"Guided mode" allows you to specify a list of commands (i.e. strings) and the transcription will be guided to classify your command into one from the list. This can be useful in situations where a device is listening only for a small subset of commands.
Initial tests show that this approach might be extremely efficient in terms of performance, since it integrates very well with the "partial Encoder" idea from #137.
command-guided-0.mp4