You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First cut at metrics for alertmanager sharding operation. (#4149)
* First cut at metrics for alertmanager sharding operation.
This commit adds a number of metrics to help track the operation of the
alertmanager sharding, specifically around the handling of the state.
- `cortex_alertmanager_fetch_replica_state_total`
- `cortex_alertmanager_fetch_replica_state_failed_total`
- `cortex_alertmanager_state_initial_sync_total`
- `cortex_alertmanager_state_initial_sync_completed_total`
- `cortex_alertmanager_state_initial_sync_duration_seconds`
- `cortex_alertmanager_state_persist_total`
- `cortex_alertmanager_state_persist_failed_total`
Note this complements the already existing metrics:
- `cortex_alertmanager_partial_state_merges_total`
- `cortex_alertmanager_partial_state_merges_failed_total`
- `cortex_alertmanager_state_replication_total`
- `cortex_alertmanager_state_replication_failed_total`
Overly detailed timing metrics have not been included, instead opting
for just a single (non per-user) histogram for the duration of the
initial state operation. Timings for storage read/write are not
included as they are already provided from the bucket client.
Signed-off-by: Steve Simpson <[email protected]>
* Update Changelog.
Signed-off-by: Steve Simpson <[email protected]>
* Review comments.
Signed-off-by: Steve Simpson <[email protected]>
* Review comments.
Signed-off-by: Steve Simpson <[email protected]>
* Update Changelog.
Signed-off-by: Steve Simpson <[email protected]>
* Review comments.
Signed-off-by: Steve Simpson <[email protected]>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+8
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,14 @@
6
6
*[CHANGE] Alertmanager: allowed to configure the experimental receivers firewall on a per-tenant basis. The following CLI flags (and their respective YAML config options) have been changed and moved to the limits config section: #4143
7
7
-`-alertmanager.receivers-firewall.block.cidr-networks` renamed to `-alertmanager.receivers-firewall-block-cidr-networks`
8
8
-`-alertmanager.receivers-firewall.block.private-addresses` renamed to `-alertmanager.receivers-firewall-block-private-addresses`
9
+
*[ENHANCEMENT] Alertmanager: introduced new metrics to monitor operation when using `-alertmanager.sharding-enabled`: #4149
# HELP cortex_alertmanager_state_fetch_replica_state_failed_total Number of times we have failed to read and merge the full state from another replica.
258
+
# TYPE cortex_alertmanager_state_fetch_replica_state_failed_total counter
# HELP cortex_alertmanager_state_fetch_replica_state_failed_total Number of times we have failed to read and merge the full state from another replica.
541
+
# TYPE cortex_alertmanager_state_fetch_replica_state_failed_total counter
# HELP cortex_alertmanager_state_fetch_replica_state_failed_total Number of times we have failed to read and merge the full state from another replica.
772
+
# TYPE cortex_alertmanager_state_fetch_replica_state_failed_total counter
// newStatePersister creates a new state persister.
59
-
funcnewStatePersister(cfgPersisterConfig, userIDstring, statePersistableState, store alertstore.AlertStore, l log.Logger) *statePersister {
64
+
funcnewStatePersister(cfgPersisterConfig, userIDstring, statePersistableState, store alertstore.AlertStore, l log.Logger, r prometheus.Registerer) *statePersister {
0 commit comments