Description
The AzureSTTService pushes the transcribed text (TranscriptionFrame
) after the user has finished speaking.
It does not push any InterimTranscriptionFrame
.
In the LLMUserContextAggregator
code, when the TranscriptionFrame
is received, we reset a timer _aggregation_event
to 1s
before calling push_aggregation()
.
# llm_response.py (LLMUserContextAggregator)
async def _handle_transcription(self, frame: TranscriptionFrame):
self._aggregation += f" {frame.text}" if self._aggregation else frame.text
# We just got a final result, so let's reset interim results.
self._seen_interim_results = False
# Reset aggregation timer.
self._aggregation_event.set()
While it makes sense for most of the providers I guess, it seems that, with Azure, when we receive the transcription, we should be able to instantly call push_aggregation()
.
To fix locally (only works with Azure) I call push_aggregation()
directly instead of resetting the timer.
I also tried to decrease aggregation_timeout
; which works.
Not sure either of these solutions are valid project-wide.
I'd be happy to help, but need some light first.