Skip to content

[feature request] Add PartialActivityProcessor to provide additional visibility to very long-running processes #2682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mladjan-gadzic opened this issue Apr 3, 2025 · 4 comments
Labels
comp:extensions Things related to OpenTelemetry.Extensions enhancement New feature or request

Comments

@mladjan-gadzic
Copy link

mladjan-gadzic commented Apr 3, 2025

Component

OpenTelemetry.Extensions

Is your feature request related to a problem?

When you have very long running processes, there is no visibility until a span is ended. A process can run for hours and in the meantime you don't know whether it's still active or not. Additionally, in case of an ungraceful shutdown, the spans are never pushed to the collector, resulting in missing crucial data.

What is the expected behavior?

We have implemented a PartialSpanProcessor that exports the Spans as logs over gRPC. The processor exports a log during the span startup, a log during the span end and heartbeat logs while the span is active. The heartbeat logs are exported on a configurable interval.

That way we can have a log on the desired backend, every 5 or 10 or 15 minutes and be aware of the process and span state. It has been tested and it works as expected.

Which alternative solutions or features have you considered?

No response

Additional context

There is a quite old issue on the otel spec regarding periodically exporting active spans in order to gain some sort of visibility for very long running processes.

open-telemetry/opentelemetry-specification#373

There are many problems with exporting partial spans and the issue remains open.

We discussed the issue on the Spec community call, a couple months ago and the suggested solution was to periodically export logs or events. Also, the java-contrib has a package with processors that have been doing data conversions.

For more info, check this comment and the ones below it

open-telemetry/opentelemetry-specification#373 (comment)

It was suggested that the individual sdk contrib repos are a good start to contribute such a processor so that people can start using it, experiment with it and provide feedback. If it turns out to be something useful then it might end up in the spec and the main repos.

At G-Research, we've also implemented a custom collector that acts as both a receiver and an exporter so that we can gather the logs, filter them and query the results.

We already have this implemented as a standalone extension of dotnet sdk, but we'd like to contribute this back to a community. We're open to suggestions and further discussions regarding our PartialActivityProcessor.

@mladjan-gadzic mladjan-gadzic added the enhancement New feature or request label Apr 3, 2025
@github-actions github-actions bot added the comp:extensions Things related to OpenTelemetry.Extensions label Apr 3, 2025
Copy link
Contributor

github-actions bot commented Apr 3, 2025

Tagging component owner(s).

@CodeBlanch @MikeGoldsmith

@martinjt
Copy link
Member

martinjt commented Apr 4, 2025

I'm not sure that this is something that we should be presenting as a generalised approach and providing tooling around.

It's a known mode of operation that spans are only generated at the end (due to the requirement for start/end times).

This would be a good usecase for the new Events sub-signal, but I wouldn't advise building artificial spans from the events, it would be upto the observability backend to represent that data in a useful way.

@mladjan-gadzic mladjan-gadzic changed the title [feature request] Solution to ungraceful shutdown of a process - PartialActivityProcessor [feature request] PartialActivityProcessor to add additional visibility to very long-running processes Apr 8, 2025
@mladjan-gadzic mladjan-gadzic changed the title [feature request] PartialActivityProcessor to add additional visibility to very long-running processes [feature request] Add PartialActivityProcessor to provide additional visibility to very long-running processes Apr 8, 2025
@xBis7
Copy link

xBis7 commented Apr 9, 2025

We raised this issue on the dotnet community call yesterday April 8. Here is a summary of what was discussed.

  • Issue title and description
    • The initial title and description didn't fully explain the scope or the intent of this issue.
    • Both of them have been updated accordingly. The main problem that we are trying to address is gaining visibility on active spans of very long-running processes (many hours or even days).
  • Action items
    • Contribute the processor under the Extensions package on the dotnet-contrib
    • Add documentation with instructions on how people can implement something similar and also how they can use it
    • Bring it up again with the otel spec to determine whether this can be added to the main repo or not, depending on how useful it turns out to be to users

@mladjan-gadzic
Copy link
Author

PartialActivityProcessor standalone repo https://github.com/G-Research/otel-partial-dotnet
Custom collector that utilizes logs from processor https://github.com/G-Research/otel-partial-collector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:extensions Things related to OpenTelemetry.Extensions enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants