Skip to content

Metrics Stop Prematurely at Program Shutdown #15912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dmuth opened this issue Jan 11, 2023 · 4 comments
Closed

Metrics Stop Prematurely at Program Shutdown #15912

dmuth opened this issue Jan 11, 2023 · 4 comments
Labels
domain: shutdown Anything related to the shutdown of Vector domain: topology Anything related to Vector's topology code type: bug A code related bug.

Comments

@dmuth
Copy link

dmuth commented Jan 11, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When I run Vector from the command line with a single data source of demo data for a few seconds, then terminate the process with ctrl-C, the values for Metrics stop being updated prematurely while events are still written.

This is my first ever Issue with Vector, so if I missed something really obvious or more troubleshooting needs to be done on my end please do let me know! :-)

Configuration

[sources.in]
type = "internal_metrics"
scrape_interval_secs = 1

[sources.in_demo]
type = "demo_logs"
format = "apache_common"
interval = 1
#interval = 5
#interval = 10


[transforms.tf]
inputs = [ "in" ]
type = "metric_to_log"


[sinks.json]
inputs = ["tf"]
type = "file"
path = "output-metrics.json"
encoding.codec = "json"


[sinks.out_demo]
inputs = ["in_demo"]
type = "file"
path = "output-demo.json"
encoding.codec = "json"

Version

vector 0.23.3 (x86_64-unknown-linux-musl af8c9e1 2022-08-10)

Debug Output

https://gist.github.com/dmuth/736ce3e8df596de6eff41376a0896556

Example Data

N/A

Additional Context

From a sample run (vector -c ./internal-metrics.toml), with interval = 1:

$ wc -l output-demo.json 
8 output-demo.json

$ cat output-metrics.json |jq -C '. | select(.name == "events_in_total" and .tags.component_name == "in_demo")
[...]
{
  "counter": {
    "value": 7
  },
  "kind": "absolute",
  "name": "events_in_total",
  "namespace": "vector",
  "tags": {
    "component_id": "in_demo",
    "component_kind": "source",
    "component_name": "in_demo",
    "component_type": "demo_logs"
  },
  "timestamp": "2023-01-11T20:52:38.202574632Z"
}

Expected value for the metrics count would be 8, matching the number of events in the output file.

When I set interval = 5 and run Vector for 6 seconds, here's what I see:

$ wc -l output-demo.json 
3 output-demo.json


$ cat output-metrics.json |jq -C '. | select(.name == "events_in_total" and .tags.component_name == "in_demo")
[...]
{
  "counter": {
    "value": 2
  },
  "kind": "absolute",
  "name": "events_in_total",
  "namespace": "vector",
  "tags": {
    "component_id": "in_demo",
    "component_kind": "source",
    "component_name": "in_demo",
    "component_type": "demo_logs"
  },
  "timestamp": "2023-01-11T20:54:29.671992619Z"
}

Watching the execution in real-time provides a clue: when I hit ctrl-C about 6 seconds in, Vector runs for another 4 seconds or so, and the third event is written to the output file. This suggests that when ctrl-C is received, metrics immediately stop being generated, buffers are flushed/pending events are processed, and then the Vector process exits.

References

None that I could find after doing a few searches here.

Thank you for your time, and all the effort that's gone into Vector so far!

-- Doug

@dmuth dmuth added the type: bug A code related bug. label Jan 11, 2023
@spencergilbert
Copy link
Contributor

Hey @dmuth - looks legit! Thanks for the thorough report. I suspect what's happening (basically what you suggested) is all of the components attempt to finish and shutdown at the same time, and the internal_metrics source happens to do it faster than the demo_logs source.

I can envision a tiered shutdown where the "actual event streams" are finished, then the "internal event streams", and then finally cleanly shutdown - to not lose any of the telemetry. It could get complicated identifying/separating components into tiers though.

@spencergilbert spencergilbert added domain: shutdown Anything related to the shutdown of Vector domain: topology Anything related to Vector's topology code labels Jan 11, 2023
@zamazan4ik
Copy link
Contributor

@spencergilbert as an ad-hoc solution instead of implementing "true" tiering, I could suggest just dropping "internal_metrics" source as the last one. Not a pretty solution but should work :) Is it worth it? I don't know, honestly.

@jszwedko
Copy link
Member

This likely affects the internal_logs source as well as both this and internal_metrics will shut down right away when receiving the signal.

github-merge-queue bot pushed a commit that referenced this issue Jun 30, 2023
…ternal ones (#17741)

<!--
**Your PR title must conform to the conventional commit spec!**

  <type>(<scope>)!: <description>

  * `type` = chore, enhancement, feat, fix, docs
  * `!` = OPTIONAL: signals a breaking change
* `scope` = Optional when `type` is "chore" or "docs", available scopes
https://github.com/vectordotdev/vector/blob/master/.github/semantic.yml#L20
  * `description` = short description of the change

Examples:

  * enhancement(file source): Add `sort` option to sort discovered files
  * feat(new source): Initial `statsd` source
  * fix(file source): Fix a bug discovering new files
  * chore(external docs): Clarify `batch_size` option
-->

We would like to close the internal logs, metrics, and trace sources
sent from Vector as late as possible during shutdown to facilitate
debugging. In this PR, we wait until all other sources are shut down
before shutting down internal telemetry sources. This means that
shutdown may be a bit longer, but we will have better observability on
the shutdown process.

issue: #15912
@jszwedko
Copy link
Member

jszwedko commented Aug 2, 2023

Closed by #17741

@jszwedko jszwedko closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: shutdown Anything related to the shutdown of Vector domain: topology Anything related to Vector's topology code type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

4 participants