Skip to content

Add response time metrics (keep track when the page timeouts) #45

Open
@MadLittleMods

Description

@MadLittleMods

Add metric when the page times out. Record the Matrix API that is still running and the duration.

Things to record in each event:

  • Response status code we ended up sending
  • Total time spent on the server rendering the request (this will just end up being the timeout configured)
  • Homeserver
  • Room ID
    • Since this has a very high cardinality (lots of possible values), we might not be able to index this but would be good to have on each metric event to inspect.
    • These extra details are nice if we want to investigate why a particular room/homeserver combo is timing out
  • Matrix API endpoint path that is still running when we timed out (like /join, /messages)
    • Is this useful? Would be nice to know where most requests get stuck at

We can also send a success metric and response time to compare against how many requests we're failing to serve vs total traffic.

Dev notes

We probably just need to add something like prom-client, expose a Prometheus /metrics scrape endpoint that serves await register.metrics(), then add a scrape annotation to the K8s service (which is still being finalized)

Adjacent: Here is an example middleware from the Gitter webapp that logs and metrics when a request is pending for more than 60 seconds, https://gitlab.com/gitterHQ/webapp/-/blob/676fadc3693260c8c51f448a0ca4c3e180d1b4a2/server/web/middlewares/pending-request.js#L50-84

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-metricsstats, metrics, dashboardsA-tracingOpenTelemetry tracing (spans, timing, observability)T-EnhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions