Skip to content

Refactor Measure block metrics to be homeserver-scoped #18591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

MadLittleMods
Copy link
Contributor

@MadLittleMods MadLittleMods commented Jun 23, 2025

Refactor Measure block metrics to be homeserver-scoped.

Spawning from #18443

Part of #18592

Testing strategy

See behavior of previous metrics listener

  1. Add the metrics listener in your homeserver.yaml
    listeners:
      - port: 9323
        type: metrics
        bind_addresses: ['127.0.0.1']
  2. Start the homeserver: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Fetch http://localhost:9323/metrics
  4. Observe response includes the block metrics (synapse_util_metrics_block_count, synapse_util_metrics_block_in_flight, etc)

See behavior of the http metrics resource

  1. Add the metrics resource to a new or existing http listeners in your homeserver.yaml
    listeners:
      - port: 9322
        type: http
        bind_addresses: ['127.0.0.1']
        resources:
          - names: [metrics]
            compress: false
  2. Start the homeserver: poetry run synapse_homeserver --config-path homeserver.yaml
  3. Fetch http://localhost:9322/_synapse/metrics (it's just a GET request so you can even do in the browser)
  4. Observe response includes the block metrics (synapse_util_metrics_block_count, synapse_util_metrics_block_in_flight, etc)

Dev notes

HomeserverMetricsManager

Todo

  • Fix the usages of Measure

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Fix `HomeserverMetricsManager` lints

Fix `metrics_manager` missing in `Measure` `__slots__`
@MadLittleMods MadLittleMods force-pushed the madlittlemods/per-hs-metrics-measure branch 2 times, most recently from 1962fc8 to 5c5090f Compare June 24, 2025 21:34
…round

Fix mypy complaints

```
synapse/handlers/delayed_events.py:266: error: Cannot determine type of "validator"  [has-type]
synapse/handlers/delayed_events.py:267: error: Cannot determine type of "event_builder_factory"  [has-type]
```
@MadLittleMods MadLittleMods force-pushed the madlittlemods/per-hs-metrics-measure branch from 5c5090f to e46690c Compare June 24, 2025 21:35
This matches the pattern of other homeserver attributes like `hs.get_clock()`
@MadLittleMods MadLittleMods changed the title Refactor Measure block metrics to be homeserver-scoped Refactor Measure block metrics to be be ready to be homeserver-scoped Jun 24, 2025
Comment on lines +287 to +313
def listen_metrics(
bind_addresses: StrCollection, port: int, metrics_manager: HomeserverMetricsManager
) -> None:
"""
Start Prometheus metrics server.
"""
from prometheus_client import start_http_server as start_http_server_prometheus
from prometheus_client import (
REGISTRY,
CollectorRegistry,
start_http_server as start_http_server_prometheus,
)

from synapse.metrics import RegistryProxy
from synapse.metrics import CombinedRegistryProxy

combined_registry_proxy = CombinedRegistryProxy(
[
# TODO: Remove `REGISTRY` once all metrics have been migrated to the
# homeserver specific metrics collector registry, see
# https://github.com/element-hq/synapse/issues/18592
REGISTRY,
metrics_manager.metrics_collector_registry,
]
)
# Cheeky cast but matches the signature of a `CollectorRegistry` instance enough
# for it to be usable in the contexts in which we use it.
# TODO Do something nicer about this.
registry = cast(CollectorRegistry, combined_registry_proxy)
Copy link
Contributor Author

@MadLittleMods MadLittleMods Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of this work can be removed if we decide that it's okay to drop the type: metrics listener (part of #18584)

@MadLittleMods MadLittleMods changed the title Refactor Measure block metrics to be be ready to be homeserver-scoped Refactor Measure block metrics to be homeserver-scoped Jun 24, 2025
@staticmethod
def collect() -> Iterable[Metric]:
for metric in REGISTRY.collect():
if not metric.name.startswith("__"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any metrics that start with __ so I've removed this logic

for metric in REGISTRY.collect():
if not metric.name.startswith("__"):
yield metric
class CombinedRegistryProxy:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CombinedRegistryProxy can be removed if we decide that it's okay to drop the type: metrics listener (part of #18584) since it's the only usage.

Comment on lines +48 to +57
# While we're in the middle of the refactor of metrics in Synapse, we need to
# merge the metrics from the global registry and the homeserver specific metrics
# collector registry.
#
# TODO: Remove `generate_latest(REGISTRY)` once all homeserver metrics have been
# migrated to the homeserver specific metrics collector registry, see
# https://github.com/element-hq/synapse/issues/18592
response = generate_latest(REGISTRY) + generate_latest(
self.metrics_manager.metrics_collector_registry
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using CombinedRegistryProxy here, I've decided to do something way more straight-forward and avoid the cheeky casting.

We have to use CombinedRegistryProxy in the other spot because we don't control how it collects metrics.

Comment on lines +98 to +104
self.in_flight: InFlightGauge[BlockInFlightMetric] = InFlightGauge(
"synapse_util_metrics_block_in_flight",
"",
labels=["block_name"],
# Matches the fields in the `BlockInFlightMetric`
sub_metrics=["real_time_max", "real_time_sum"],
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this specific metric doesn't use the homeserver metrics_collector_registry. This is planned for a future PR where we update InFlightGauge to support a custom registry option. Tracked in #18592

Comment on lines +310 to +313
# Cheeky cast but matches the signature of a `CollectorRegistry` instance enough
# for it to be usable in the contexts in which we use it.
# TODO Do something nicer about this.
registry = cast(CollectorRegistry, combined_registry_proxy)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast is the same thing we were doing before with the RegistryProxy, nothing new. I'm not very keen on thinking of something better here.

@MadLittleMods MadLittleMods marked this pull request as ready for review June 25, 2025 01:30
@MadLittleMods MadLittleMods requested a review from a team as a code owner June 25, 2025 01:30
@MadLittleMods
Copy link
Contributor Author

Per our decision in #18592 (comment) to add instance labels to the metrics directly instead of using a separate reigstry, closing in favor of #18601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant