Skip to content

Remove metrics listener type in favor of http listener with metrics resource #18584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
bb31f60
Remove `metrics` listener type in favor of `http` listener with `metr…
MadLittleMods Jun 20, 2025
1fe3fa4
Disable `_created` metrics via dedicated built-in function
MadLittleMods Jun 20, 2025
6e47458
Fix lints
MadLittleMods Jun 20, 2025
3e16b88
Remove stray file
MadLittleMods Jun 20, 2025
3b05b5e
Add changelog
MadLittleMods Jun 20, 2025
54b836c
Remove tests for `_set_prometheus_client_use_created_metrics` (which …
MadLittleMods Jun 20, 2025
8ccd52c
Fix lints
MadLittleMods Jun 20, 2025
bb3904c
Restore `_set_prometheus_client_use_created_metrics`
MadLittleMods Jun 20, 2025
adaf5b0
Update with new state of reality
MadLittleMods Jun 20, 2025
cfdacb8
Fix lints
MadLittleMods Jun 20, 2025
16e87ed
Restore `_set_prometheus_client_use_created_metrics` tests
MadLittleMods Jun 20, 2025
07c6806
Update minimum version of `prometheus-client` so `prometheus_client.m…
MadLittleMods Jun 20, 2025
d060446
Remove tests which don't assert anything beyond what the util already…
MadLittleMods Jun 23, 2025
04084ae
Re-organize metrics resource
MadLittleMods Jun 23, 2025
b849bed
Make broken hack more obvious when it happens
MadLittleMods Jun 23, 2025
6c662e0
Fix lints
MadLittleMods Jun 23, 2025
03ae604
Remove extra newline
MadLittleMods Jun 23, 2025
1ce74d4
Log an error for old versions
MadLittleMods Jun 23, 2025
467084d
Merge branch 'develop' into madlittlemods/remove-metrics-listener
MadLittleMods Jun 23, 2025
25585b4
Merge branch 'develop' into madlittlemods/remove-metrics-listener
MadLittleMods Jun 24, 2025
95eab21
Merge branch 'develop' into madlittlemods/remove-metrics-listener
MadLittleMods Jun 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/18584.removal
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`).
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions on calling out the upgrade notes from the changelog entry itself? It looks like usually we prefer to call it out at the top of the changelog section for the version. Doesn't seem like it would hurt to have it in both places. The only problem being the chicken and egg problem of the upgrade notes being written later in order to have the link filled in.

Suggested change
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`).
Remove `metrics` listener (serves metrics at `/metrics`) type in favor of `http` listener with `metrics` resource (serves metrics at `/_synapse/metrics`). Please check the [relevant section in the upgrade notes](TODO).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me why metrics was its own listener type before. The only difference I can see is that they are served on different paths: /metrics vs /_synapse/metrics. Feels like we just need to include this in the upgrade notes (drafted in the PR description).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that @erikjohnston pointed out is that the dedicated metrics listener uses it's own thread and doesn't rely on the Twisted reactor so it still works regardless of how well Synapse is working. It's unclear if we really care about this benefit. It does have a theoretical benefit of being able to monitor a really bogged down Synapse.

Looking at the git history, the threading aspect just seems like the way it was originally implemented and not something explicitly called out for the benefit described above.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the updated conclusion from #18592 (comment) which most likely means we will continue to use the global REGISTRY, I think it's still worth distilling down to a single way to setup metrics; for the config clarity sake and for the codebase single source of truth sake.

(assuming we can't come up with a benefit/compelling reason for why we want both)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about it a bit: I think it is worth keeping the separate metrics listener. I can't find where (and I looked), but I believe we have ported stuff to use the separate metrics listener in the past to get metrics from the process when its under extreme load (to the extent where twisted can't process new requests).

Even if we keep the separate listener, I think we should still make them render from the same source and remove any differences. If needed its easy enough to change the metrics listener to not use the inbuilt prometheus one, but instead replace it by spawning a thread and starting a basic web server that responds to /metrics.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate metrics listener in the past to get metrics from the process when its under extreme load (to the extent where twisted can't process new requests).

How sure are we that we care about this? Feels like a use case that we're just holding onto especially when we can't find any evidence.

From what I'm reading, threading.Thread (which prometheus_client.start_http_server_prometheus(...) uses), shares the same process and memory space so it doesn't even help us in the scenarios where Synapse is under extreme load.

[...] the threading module operates within a single process, meaning that all threads share the same memory space. However, the GIL limits the performance gains of threading when it comes to CPU-bound tasks, as only one thread can execute Python bytecode at a time.

-- https://docs.python.org/3/library/threading.html#gil-and-performance-considerations


Even if we keep the separate listener, I think we should still make them render from the same source and remove any differences.

I don't see an easy way forward as I don't know how to start another Twisted server in another thread in order to share the MetricsResource. It's a pretty simple resource so trying to abstract it any further isn't useful and we might as well keep the status-quo of disparate handling 🤷

I can pull out some of the clean-up from this PR to ship separately.

5 changes: 4 additions & 1 deletion docs/metrics-howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,11 @@

# beginning of the new metrics listener
- port: 9000
type: metrics
type: http
bind_addresses: ['::1', '127.0.0.1']
resources:
- names: [metrics]
compress: false
```

1. Restart Synapse.
Expand Down
2 changes: 1 addition & 1 deletion docs/usage/configuration/config_documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,7 @@ Options for each entry include:

* `bind_addresses` (array|null): A list of local addresses to listen on. The default is "all local interfaces".

* `type` (string): The type of listener. Normally `http`, but other valid options are [`manhole`](../../manhole.md) and [`metrics`](../../metrics-howto.md).
* `type` (string): The type of listener. Normally `http`, but other valid options are [`manhole`](../../manhole.md).

* `tls` (boolean): Set to true to enable TLS for this listener. Will use the TLS key/cert specified in tls_private_key_path/tls_certificate_path.

Expand Down
5 changes: 3 additions & 2 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

114 changes: 60 additions & 54 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,38 +1,38 @@
[tool.towncrier]
package = "synapse"
filename = "CHANGES.md"
directory = "changelog.d"
issue_format = "[\\#{issue}](https://github.com/element-hq/synapse/issues/{issue})"

[[tool.towncrier.type]]
directory = "feature"
name = "Features"
showcontent = true

[[tool.towncrier.type]]
directory = "bugfix"
name = "Bugfixes"
showcontent = true

[[tool.towncrier.type]]
directory = "docker"
name = "Updates to the Docker image"
showcontent = true

[[tool.towncrier.type]]
directory = "doc"
name = "Improved Documentation"
showcontent = true

[[tool.towncrier.type]]
directory = "removal"
name = "Deprecations and Removals"
showcontent = true

[[tool.towncrier.type]]
directory = "misc"
name = "Internal Changes"
showcontent = true
package = "synapse"
filename = "CHANGES.md"
directory = "changelog.d"
issue_format = "[\\#{issue}](https://github.com/element-hq/synapse/issues/{issue})"

[[tool.towncrier.type]]
directory = "feature"
name = "Features"
showcontent = true

[[tool.towncrier.type]]
directory = "bugfix"
name = "Bugfixes"
showcontent = true

[[tool.towncrier.type]]
directory = "docker"
name = "Updates to the Docker image"
showcontent = true

[[tool.towncrier.type]]
directory = "doc"
name = "Improved Documentation"
showcontent = true

[[tool.towncrier.type]]
directory = "removal"
name = "Deprecations and Removals"
showcontent = true

[[tool.towncrier.type]]
directory = "misc"
name = "Internal Changes"
showcontent = true

[tool.ruff]
line-length = 88
Expand All @@ -47,11 +47,7 @@ target-version = "py39"
# flake8-bugbear compatible checks. Its error codes are described at
# https://beta.ruff.rs/docs/rules/#flake8-bugbear-b
# B023: Functions defined inside a loop must not use variables redefined in the loop
ignore = [
"B023",
"E501",
"E731",
]
ignore = ["B023", "E501", "E731"]
select = [
# pycodestyle
"E",
Expand Down Expand Up @@ -82,7 +78,15 @@ select = [

[tool.ruff.lint.isort]
combine-as-imports = true
section-order = ["future", "standard-library", "third-party", "twisted", "first-party", "testing", "local-folder"]
section-order = [
"future",
"standard-library",
"third-party",
"twisted",
"first-party",
"testing",
"local-folder",
]
known-first-party = ["synapse"]

[tool.ruff.lint.isort.sections]
Expand All @@ -107,9 +111,7 @@ authors = ["Matrix.org Team and Contributors <[email protected]>"]
license = "AGPL-3.0-or-later"
readme = "README.rst"
repository = "https://github.com/element-hq/synapse"
packages = [
{ include = "synapse" },
]
packages = [{ include = "synapse" }]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Topic :: Communications :: Chat",
Expand All @@ -125,7 +127,7 @@ include = [
{ path = "INSTALL.md", format = "sdist" },
{ path = "mypy.ini", format = "sdist" },
{ path = "scripts-dev", format = "sdist" },
{ path = "synmark", format="sdist" },
{ path = "synmark", format = "sdist" },
{ path = "sytest-blacklist", format = "sdist" },
{ path = "tests", format = "sdist" },
{ path = "UPGRADE.rst", format = "sdist" },
Expand All @@ -135,9 +137,7 @@ include = [
{ path = "rust/build.rs", format = "sdist" },
{ path = "rust/src/**", format = "sdist" },
]
exclude = [
{ path = "synapse/*.so", format = "sdist"}
]
exclude = [{ path = "synapse/*.so", format = "sdist" }]

[tool.poetry.build]
script = "build_rust.py"
Expand Down Expand Up @@ -178,7 +178,7 @@ signedjson = "^1.1.0"
service-identity = ">=18.1.0"
# Twisted 18.9 introduces some logger improvements that the structured
# logger utilises
Twisted = {extras = ["tls"], version = ">=18.9.0"}
Twisted = { extras = ["tls"], version = ">=18.9.0" }
treq = ">=15.1"
# Twisted has required pyopenssl 16.0 since about Twisted 16.6.
pyOpenSSL = ">=16.0.0"
Expand All @@ -195,7 +195,9 @@ pymacaroons = ">=0.13.0"
msgpack = ">=0.5.2"
phonenumbers = ">=8.2.0"
# we use GaugeHistogramMetric, which was added in prom-client 0.4.0.
prometheus-client = ">=0.4.0"
# `prometheus_client.metrics` was added in 0.5.0, so we require that too.
# We chose 0.6.0 as that is the current version in Debian Buster (oldstable).
prometheus-client = ">=0.6.0"
Comment on lines +198 to +200
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change is here.

The rest is auto-formatting which I can remove from the diff if prompted.

# we use `order`, which arrived in attrs 19.2.0.
# Note: 21.1.0 broke `/sync`, see https://github.com/matrix-org/synapse/issues/9936
attrs = ">=19.2.0,!=21.1.0"
Expand Down Expand Up @@ -297,7 +299,9 @@ all = [
# matrix-synapse-ldap3
"matrix-synapse-ldap3",
# postgres
"psycopg2", "psycopg2cffi", "psycopg2cffi-compat",
"psycopg2",
"psycopg2cffi",
"psycopg2cffi-compat",
# saml2
"pysaml2",
# oidc and jwt
Expand All @@ -307,9 +311,11 @@ all = [
# sentry
"sentry-sdk",
# opentracing
"jaeger-client", "opentracing",
"jaeger-client",
"opentracing",
# redis
"txredisapi", "hiredis",
"txredisapi",
"hiredis",
# cache-memory
"pympler",
# improved user search
Expand Down Expand Up @@ -397,8 +403,8 @@ enable = "pypy"
#
# We temporarily pin Rust to 1.82.0 to work around
# https://github.com/element-hq/synapse/issues/17988
before-all = "sh .ci/before_build_wheel.sh"
environment= { PATH = "$PATH:$HOME/.cargo/bin" }
before-all = "sh .ci/before_build_wheel.sh"
environment = { PATH = "$PATH:$HOME/.cargo/bin" }

# For some reason if we don't manually clean the build directory we
# can end up polluting the next build with a .so that is for the wrong
Expand Down
3 changes: 1 addition & 2 deletions schema/synapse-config.schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -451,8 +451,7 @@ properties:
type: string
description: >-
The type of listener. Normally `http`, but other valid options are
[`manhole`](../../manhole.md) and
[`metrics`](../../metrics-howto.md).
[`manhole`](../../manhole.md).
enum:
- http
- manhole
Expand Down
36 changes: 0 additions & 36 deletions synapse/app/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,42 +283,6 @@ async def wrapper() -> None:
reactor.callWhenRunning(lambda: defer.ensureDeferred(wrapper()))


def listen_metrics(bind_addresses: StrCollection, port: int) -> None:
"""
Start Prometheus metrics server.
"""
from prometheus_client import start_http_server as start_http_server_prometheus

from synapse.metrics import RegistryProxy

for host in bind_addresses:
logger.info("Starting metrics listener on %s:%d", host, port)
_set_prometheus_client_use_created_metrics(False)
start_http_server_prometheus(port, addr=host, registry=RegistryProxy)


def _set_prometheus_client_use_created_metrics(new_value: bool) -> None:
"""
Sets whether prometheus_client should expose `_created`-suffixed metrics for
all gauges, histograms and summaries.
There is no programmatic way to disable this without poking at internals;
the proper way is to use an environment variable which prometheus_client
loads at import time.

The motivation for disabling these `_created` metrics is that they're
a waste of space as they're not useful but they take up space in Prometheus.
"""

import prometheus_client.metrics

if hasattr(prometheus_client.metrics, "_use_created"):
prometheus_client.metrics._use_created = new_value
else:
logger.error(
"Can't disable `_created` metrics in prometheus_client (brittle hack broken?)"
)


def listen_manhole(
bind_addresses: StrCollection,
port: int,
Expand Down
17 changes: 0 additions & 17 deletions synapse/app/generic_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,23 +283,6 @@ def start_listening(self) -> None:
raise ConfigError(
"Can not using a unix socket for manhole at this time."
)

elif listener.type == "metrics":
if not self.config.metrics.enable_metrics:
logger.warning(
"Metrics listener configured, but enable_metrics is not True!"
)
else:
if isinstance(listener, TCPListenerConfig):
_base.listen_metrics(
listener.bind_addresses,
listener.port,
)
else:
raise ConfigError(
"Can not use a unix socket for metrics at this time."
)

else:
logger.warning("Unsupported listener type: %s", listener.type)

Expand Down
16 changes: 0 additions & 16 deletions synapse/app/homeserver.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,22 +286,6 @@ def start_listening(self) -> None:
raise ConfigError(
"Can not use a unix socket for manhole at this time."
)
elif listener.type == "metrics":
if not self.config.metrics.enable_metrics:
logger.warning(
"Metrics listener configured, but enable_metrics is not True!"
)
else:
if isinstance(listener, TCPListenerConfig):
_base.listen_metrics(
listener.bind_addresses,
listener.port,
)
else:
raise ConfigError(
"Can not use a unix socket for metrics at this time."
)

else:
# this shouldn't happen, as the listener type should have been checked
# during parsing
Expand Down
Loading
Loading