Azure STT transcription does not update the context aggregator instantly after speech fully recognized.

The AzureSTTService pushes the transcribed text (`TranscriptionFrame`) after the user has finished speaking.
It does not push any `InterimTranscriptionFrame`.

In the `LLMUserContextAggregator` code, when the `TranscriptionFrame` is received, we reset a timer `_aggregation_event` to `1s` before calling `push_aggregation()`.

```py
# llm_response.py (LLMUserContextAggregator)
async def _handle_transcription(self, frame: TranscriptionFrame):
        self._aggregation += f" {frame.text}" if self._aggregation else frame.text
        # We just got a final result, so let's reset interim results.
        self._seen_interim_results = False
        # Reset aggregation timer.
        self._aggregation_event.set()
```

While it makes sense for most of the providers I guess, it seems that, with Azure, when we receive the transcription, we should be able to instantly call `push_aggregation()`.

To fix locally (only works with Azure) I call `push_aggregation()` directly instead of resetting the timer.
I also tried to decrease `aggregation_timeout`; which works.

Not sure either of these solutions are valid project-wide.
I'd be happy to help, but need some light first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Azure STT transcription does not update the context aggregator instantly after speech fully recognized. #1440

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Azure STT transcription does not update the context aggregator instantly after speech fully recognized. #1440

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions