Open
Description
Extracted from ipfs/kubo#9927 (comment)
Currently, we have basic request counts and durations for gateway=ipfs
and gateway=ipns
namespaces in the form of boxo/gateway/metrics.go metrics:
# HELP ipfs_http_gw_get_duration_seconds The time to GET a successful response to a request (all content types).
# TYPE ipfs_http_gw_get_duration_seconds histogram
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="0.05"} 8
[..]
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="1920"} 11
ipfs_http_gw_get_duration_seconds_bucket{gateway="ipfs",le="+Inf"} 11
ipfs_http_gw_get_duration_seconds_sum{gateway="ipfs"} 1.185360469
Problem
/ipns
supports both DNSLink and Signed IPNS records – we have no visibility what is the % of each- we measure success only, have no visibility into % of IPNS record failures vs DNSLink failures
Solution
Requirements
TBD, initial requirements
- we need dedicated metric for each type of
/ipns/
requestsigned_ipns
dnslink
- we need to be able to tell:
- how many requests were sent by clients
- how many requests were successful vs errored
- how long success / error takes? (could be precomputed P50/P95)
- we need to make sure this is visible in Thunderdome testing so we can catch regressions here during release phase
Open questions
- do we have a separate metrics for success/failure, or do we have single one with success/error attribute?
- do we do histogram with predefined duration buckets and implicit counter (like
ipfs_http_gw_get_duration_seconds
)? - or maybe, instead of picking arbitrary duration buckets (like we have in legacy metrics) we should have P50, P75, P95, P99 Objectives, like we do here?