You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[chore][pkg/stanza] Fix the bug that the log emitter might hang when the receiver retry indefinitely (open-telemetry#37159)
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
I was exploring options for backpressure the pipeline when the exporter
fails. Inspired by
open-telemetry#29410 (comment),
I realized that I could enable the `retry_on_failure` on the receiver
side, and have it retry indefinitely by setting `max_elapsed_time` to 0.
```yaml
receivers:
filelog:
include: [ input.log ]
retry_on_failure:
enabled: true
max_elapsed_time: 0
```
With this config, the consumer will be blocked at the `ConsumeLogs` func
in `consumerretry` when the exporter fails to consume the logs:
https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/12551d324375bd0c4647a8cdc7bd0f8c435c1034/internal/coreinternal/consumerretry/logs.go#L35
The func `flusher()` from the `LogEmitter` starts a loop and call the
`consumerFunc` with `context.Background()`. When the `ConsumeLogs` is
blocked by the retry, there is no way to cancel the retry, thus the
`LogEmitter` will hang when I try to shut down the collector.
In this PR, I created a ctx in the `Start` func, which will be cancelled
later in the `Shutdown` func. The ctx is passed to the flusher and used
for the flush in every `flushInterval`. However, I have to swap it with
another ctx with timeout during shutdown to flush the remaining batch
out one last time. That's the best approach I can think of for now, and
I'm open to other suggestions.
---------
Signed-off-by: Mengnan Gong <[email protected]>
Co-authored-by: Daniel Jaglowski <[email protected]>
0 commit comments