-
Notifications
You must be signed in to change notification settings - Fork 347
Remove metrics
listener type in favor of http
listener with metrics
resource
#18584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
@@ -0,0 +1 @@ | |||
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinions on calling out the upgrade notes from the changelog entry itself? It looks like usually we prefer to call it out at the top of the changelog section for the version. Doesn't seem like it would hurt to have it in both places. The only problem being the chicken and egg problem of the upgrade notes being written later in order to have the link filled in.
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`). | |
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`). Please check the [relevant section in the upgrade notes](TODO). |
…etrics` is available See #18584 (comment)
# `prometheus_client.metrics` was added in 0.5.0, so we require that too. | ||
# We chose 0.6.0 as that is the current version in Debian Buster (oldstable). | ||
prometheus-client = ">=0.6.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only change is here.
The rest is auto-formatting which I can remove from the diff if prompted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted this file in favor of re-organizing the remaining pieces in more relevant file names.
class PrometheusMetricsHackTestCase(unittest.HomeserverTestCase): | ||
if parse_version(metadata.version("prometheus_client")) < parse_version("0.14.0"): | ||
skip = "prometheus-client too old" | ||
|
||
def test_created_metrics_disabled(self) -> None: | ||
""" | ||
Tests that a brittle hack, to disable `_created` metrics, works. | ||
This involves poking at the internals of prometheus-client. | ||
It's not the end of the world if this doesn't work. | ||
|
||
This test gives us a way to notice if prometheus-client changes | ||
their internals. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this test as it claims to "gives us a way to notice if prometheus-client changes their internals." but we already have that benefit from the hack function _set_prometheus_client_use_created_metrics
itself (it will yell if the _use_created
attribute is no longer used).
The test provides no further benefit beside asserting that the attribute was set which feels pretty useless. A proper test would ensure that before running the utility, we see the legacy _created
metrics and after we don't.
from prometheus_client import generate_latest | ||
|
||
from synapse.metrics import REGISTRY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray thing that I noticed. Making it consistent that we pull from synapse.metrics
for generate_latest
since it's something we're explicitly exporting
synapse/synapse/metrics/__init__.py
Lines 477 to 480 in 3cabaa8
__all__ = [ | |
"Collector", | |
"MetricsResource", | |
"generate_latest", |
@@ -0,0 +1 @@ | |||
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear to me why metrics
was its own listener type before. The only difference I can see is that they are served on different paths: /metrics
vs /_synapse/metrics
. Feels like we just need to include this in the upgrade notes (drafted in the PR description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that @erikjohnston pointed out is that the dedicated metrics
listener uses it's own thread and doesn't rely on the Twisted reactor so it still works regardless of how well Synapse is working. It's unclear if we really care about this benefit. It does have a theoretical benefit of being able to monitor a really bogged down Synapse.
Looking at the git history, the threading aspect just seems like the way it was originally implemented and not something explicitly called out for the benefit described above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the updated conclusion from #18592 (comment) which most likely means we will continue to use the global REGISTRY
, I think it's still worth distilling down to a single way to setup metrics; for the config clarity sake and for the codebase single source of truth sake.
(assuming we can't come up with a benefit/compelling reason for why we want both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about it a bit: I think it is worth keeping the separate metrics
listener. I can't find where (and I looked), but I believe we have ported stuff to use the separate metrics
listener in the past to get metrics from the process when its under extreme load (to the extent where twisted can't process new requests).
Even if we keep the separate listener, I think we should still make them render from the same source and remove any differences. If needed its easy enough to change the metrics
listener to not use the inbuilt prometheus one, but instead replace it by spawning a thread and starting a basic web server that responds to /metrics
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate
metrics
listener in the past to get metrics from the process when its under extreme load (to the extent where twisted can't process new requests).
How sure are we that we care about this? Feels like a use case that we're just holding onto especially when we can't find any evidence.
From what I'm reading, threading.Thread
(which prometheus_client.start_http_server_prometheus(...)
uses), shares the same process and memory space so it doesn't even help us in the scenarios where Synapse is under extreme load.
[...] the threading module operates within a single process, meaning that all threads share the same memory space. However, the GIL limits the performance gains of threading when it comes to CPU-bound tasks, as only one thread can execute Python bytecode at a time.
-- https://docs.python.org/3/library/threading.html#gil-and-performance-considerations
Even if we keep the separate listener, I think we should still make them render from the same source and remove any differences.
I don't see an easy way forward as I don't know how to start another Twisted server in another thread in order to share the MetricsResource
. It's a pretty simple resource so trying to abstract it any further isn't useful and we might as well keep the status-quo of disparate handling 🤷
I can pull out some of the clean-up from this PR to ship separately.
Conflicts: poetry.lock
Remove
metrics
listener type in favor ofhttp
listener withmetrics
resource (which already exists).This is spawning from wanting to refactor which registry the metrics are pulled from and noticing two sources of truth. The refactor itself is spawning from running multiple Synapse in the same Python process and wanting separate metrics (see #18592)
This will require homeserver admins to update their homeserver configs in some cases. I've drafted upgrade notes for the release below.
Split out from #18443
Part of #18592
Why did we have a separate
metrics
listener type?See discussion below: #18584 (comment)
Draft upgrade notes
Drafted to make it easy to add to
docs/upgrade.md
for whoever is making the release.Upgrade notes
Upgrading to vUPCOMING
Dedicated
metrics
listener removed in favor ofmetrics
resource in thehttp
listenerThe dedicated
type: metrics
listener has been removed. Use themetrics
resource in antype: http
listener instead.Example before:
homeserver.yaml
Example after:
homeserver.yaml
Note: The endpoint path changes from
/metrics
to/_synapse/metrics
- update your metrics scraper (e.g., Prometheus) accordingly.See the Metrics How-to page for more information on how to setup and configure metrics.
Testing strategy
See behavior of previous
metrics
listenermetrics
listener in yourhomeserver.yaml
poetry run synapse_homeserver --config-path homeserver.yaml
http://localhost:9323/metrics
See behavior of the
http
metrics
resourcemetrics
resource to a new or existinghttp
listeners in yourhomeserver.yaml
poetry run synapse_homeserver --config-path homeserver.yaml
http://localhost:9322/_synapse/metrics
(it's just aGET
request so you can even do in the browser)develop
There is no difference in the response from this PR to
develop
or from this resource to themetrics
listener being removed.Dev notes
metrics
listener useslisten_metrics
http
listener withmetrics
resource usesMetricsResource
Related PRs:
_set_prometheus_client_use_created_metrics
_set_prometheus_client_use_created_metrics()
was added in matrix-org/synapse#13540 (2022-08-16)But now this functionality has dedicated functions in
prometheus_client
viadisable_created_metrics()
/enable_created_metrics()
which was added in prometheus/client_python#973 (2023-10-26). Part of[email protected]
(2023-10-30).Our deprecation policy states that we support whatever is provided by Debian oldstable. Which for
python-prometheus-client
is currently0.6.0-1
(too old).Todo
_set_prometheus_client_use_created_metrics
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.