Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend alertmanager limits to cover all integrations. #4163

Merged
merged 8 commits into from
May 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- `-alertmanager.receivers-firewall.block.cidr-networks` renamed to `-alertmanager.receivers-firewall-block-cidr-networks`
- `-alertmanager.receivers-firewall.block.private-addresses` renamed to `-alertmanager.receivers-firewall-block-private-addresses`
* [CHANGE] Change default value of `-server.grpc.keepalive.min-time-between-pings` to `10s` and `-server.grpc.keepalive.ping-without-stream-allowed` to `true`. #4168
* [FEATURE] Alertmanager: Added rate-limits to email notifier. Rate limits can be configured using `-alertmanager.email-notification-rate-limit` and `-alertmanager.email-notification-burst-size`. These limits are applied on individual alertmanagers. Rate-limited email notifications are failed notifications. It is possible to monitor rate-limited notifications via new `cortex_alertmanager_notification_rate_limited_total` metric. #4135
* [FEATURE] Alertmanager: Added rate-limits to notifiers. Rate limits used by all integrations can be configured using `-alertmanager.notification-rate-limit`, while per-integration rate limits can be specified via `-alertmanager.notification-rate-limit-per-integration` parameter. Both shared and per-integration limits can be overwritten using overrides mechanism. These limits are applied on individual (per-tenant) alertmanagers. Rate-limited notifications are failed notifications. It is possible to monitor rate-limited notifications via new `cortex_alertmanager_notification_rate_limited_total` metric. #4135 #4163
* [ENHANCEMENT] Alertmanager: introduced new metrics to monitor operation when using `-alertmanager.sharding-enabled`: #4149
* `cortex_alertmanager_state_fetch_replica_state_total`
* `cortex_alertmanager_state_fetch_replica_state_failed_total`
Expand Down
24 changes: 14 additions & 10 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -4107,16 +4107,20 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# CLI flag: -alertmanager.receivers-firewall-block-private-addresses
[alertmanager_receivers_firewall_block_private_addresses: <boolean> | default = false]

# Per-user rate limit for sending email notifications from Alertmanager in
# emails/sec. 0 = rate limit disabled. Negative value = no emails are allowed.
# CLI flag: -alertmanager.email-notification-rate-limit
[alertmanager_email_notification_rate_limit: <float> | default = 0]

# Per-user burst size for email notifications. If set to 0, no email
# notifications will be sent, unless rate-limit is disabled, in which case all
# email notifications are allowed.
# CLI flag: -alertmanager.email-notification-burst-size
[alertmanager_email_notification_burst_size: <int> | default = 1]
# Per-user rate limit for sending notifications from Alertmanager in
# notifications/sec. 0 = rate limit disabled. Negative value = no notifications
# are allowed.
# CLI flag: -alertmanager.notification-rate-limit
[alertmanager_notification_rate_limit: <float> | default = 0]

# Per-integration notification rate limits. Value is a map, where each key is
# integration name and value is a rate-limit (float). On command line, this map
# is given in JSON format. Rate limit has the same meaning as
# -alertmanager.notification-rate-limit, but only applies for specific
# integration. Allowed integration names: webhook, email, pagerduty, opsgenie,
# wechat, slack, victorops, pushover.
# CLI flag: -alertmanager.notification-rate-limit-per-integration
[alertmanager_notification_rate_limit_per_integration: <map of string to float64> | default = {}]
```

### `redis_config`
Expand Down
1 change: 1 addition & 0 deletions docs/configuration/v1-guarantees.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,3 +71,4 @@ Currently experimental features are:
- `-ingester_stream_chunks_when_using_blocks` (boolean) field in runtime config file
- Instance limits in ingester and distributor
- Exemplar storage, currently in-memory only within the Ingester based on Prometheus exemplar storage (`-blocks-storage.tsdb.max-exemplars`)
- Alertmanager: notification rate limits. (`-alertmanager.notification-rate-limit` and `-alertmanager.notification-rate-limit-per-integration`)
17 changes: 10 additions & 7 deletions pkg/alertmanager/alertmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -346,10 +346,11 @@ func (am *Alertmanager) ApplyConfig(userID string, conf *config.Config, rawCfg s
firewallDialer := util_net.NewFirewallDialer(newFirewallDialerConfigProvider(userID, am.cfg.Limits))

integrationsMap, err := buildIntegrationsMap(conf.Receivers, tmpl, firewallDialer, am.logger, func(integrationName string, notifier notify.Notifier) notify.Notifier {
if integrationName == "email" && am.cfg.Limits != nil {
if am.cfg.Limits != nil {
rl := &tenantRateLimits{
tenant: userID,
limits: am.cfg.Limits,
tenant: userID,
limits: am.cfg.Limits,
integration: integrationName,
}

return newRateLimitedNotifier(notifier, rl, 10*time.Second, am.rateLimitedNotifications.WithLabelValues(integrationName))
Expand Down Expand Up @@ -507,6 +508,7 @@ func buildReceiverIntegrations(nc *config.Receiver, tmpl *template.Template, fir
for i, c := range nc.PushoverConfigs {
add("pushover", i, c, func(l log.Logger) (notify.Notifier, error) { return pushover.New(c, tmpl, l, httpOps...) })
}
// If we add support for more integrations, we need to add them to validation as well. See validation.allowedIntegrationNames field.
if errs.Len() > 0 {
return nil, &errs
}
Expand Down Expand Up @@ -560,14 +562,15 @@ func (p firewallDialerConfigProvider) BlockPrivateAddresses() bool {
}

type tenantRateLimits struct {
tenant string
limits Limits
tenant string
integration string
limits Limits
}

func (t *tenantRateLimits) RateLimit() rate.Limit {
return t.limits.EmailNotificationRateLimit(t.tenant)
return t.limits.NotificationRateLimit(t.tenant, t.integration)
}

func (t *tenantRateLimits) Burst() int {
return t.limits.EmailNotificationBurst(t.tenant)
return t.limits.NotificationBurstSize(t.tenant, t.integration)
}
13 changes: 7 additions & 6 deletions pkg/alertmanager/multitenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -235,16 +235,17 @@ type Limits interface {
// in the Alertmanager receivers for the given user.
AlertmanagerReceiversBlockPrivateAddresses(user string) bool

// EmailNotificationRateLimit returns limit used by rate-limiter. If set to 0, no emails are allowed.
// rate.Inf = all emails are allowed.
// NotificationRateLimit methods return limit used by rate-limiter for given integration.
// If set to 0, no notifications are allowed.
// rate.Inf = all notifications are allowed.
//
// Note that when negative or zero values specified by user are translated to rate.Limit by Overrides,
// Note that when negative or zero values specified by user are translated to rate.Limit by Overrides,
// and may have different meaning there.
EmailNotificationRateLimit(tenant string) rate.Limit
NotificationRateLimit(tenant string, integration string) rate.Limit

// EmailNotificationBurst returns burst-size for rate limiter. If 0, no notifications are allowed except
// NotificationBurstSize returns burst-size for rate limiter for given integration type. If 0, no notifications are allowed except
// when limit == rate.Inf.
EmailNotificationBurst(tenant string) int
NotificationBurstSize(tenant string, integration string) int
}

// A MultitenantAlertmanager manages Alertmanager instances for multiple
Expand Down
4 changes: 2 additions & 2 deletions pkg/alertmanager/multitenant_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1857,10 +1857,10 @@ func (m mockAlertManagerLimits) AlertmanagerReceiversBlockPrivateAddresses(user
panic("implement me")
}

func (m mockAlertManagerLimits) EmailNotificationRateLimit(_ string) rate.Limit {
func (m mockAlertManagerLimits) NotificationRateLimit(_ string, integration string) rate.Limit {
return m.emailNotificationRateLimit
}

func (m mockAlertManagerLimits) EmailNotificationBurst(_ string) int {
func (m mockAlertManagerLimits) NotificationBurstSize(_ string, integration string) int {
return m.emailNotificationBurst
}
69 changes: 57 additions & 12 deletions pkg/util/validation/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"errors"
"flag"
"math"
"strings"
"time"

"github.com/prometheus/common/model"
Expand Down Expand Up @@ -103,8 +104,8 @@ type Limits struct {
AlertmanagerReceiversBlockPrivateAddresses bool `yaml:"alertmanager_receivers_firewall_block_private_addresses" json:"alertmanager_receivers_firewall_block_private_addresses"`

// Alertmanager limits
EmailNotificationRateLimit float64 `yaml:"alertmanager_email_notification_rate_limit" json:"alertmanager_email_notification_rate_limit"`
EmailNotificationBurstSize int `yaml:"alertmanager_email_notification_burst_size" json:"alertmanager_email_notification_burst_size"`
NotificationRateLimit float64 `yaml:"alertmanager_notification_rate_limit" json:"alertmanager_notification_rate_limit"`
NotificationRateLimitPerIntegration NotificationRateLimitMap `yaml:"alertmanager_notification_rate_limit_per_integration" json:"alertmanager_notification_rate_limit_per_integration"`
}

// RegisterFlags adds the flags required to config this to the given FlagSet
Expand Down Expand Up @@ -165,8 +166,13 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
// Alertmanager.
f.Var(&l.AlertmanagerReceiversBlockCIDRNetworks, "alertmanager.receivers-firewall-block-cidr-networks", "Comma-separated list of network CIDRs to block in Alertmanager receiver integrations.")
f.BoolVar(&l.AlertmanagerReceiversBlockPrivateAddresses, "alertmanager.receivers-firewall-block-private-addresses", false, "True to block private and local addresses in Alertmanager receiver integrations. It blocks private addresses defined by RFC 1918 (IPv4 addresses) and RFC 4193 (IPv6 addresses), as well as loopback, local unicast and local multicast addresses.")
f.Float64Var(&l.EmailNotificationRateLimit, "alertmanager.email-notification-rate-limit", 0, "Per-user rate limit for sending email notifications from Alertmanager in emails/sec. 0 = rate limit disabled. Negative value = no emails are allowed.")
f.IntVar(&l.EmailNotificationBurstSize, "alertmanager.email-notification-burst-size", 1, "Per-user burst size for email notifications. If set to 0, no email notifications will be sent, unless rate-limit is disabled, in which case all email notifications are allowed.")

f.Float64Var(&l.NotificationRateLimit, "alertmanager.notification-rate-limit", 0, "Per-user rate limit for sending notifications from Alertmanager in notifications/sec. 0 = rate limit disabled. Negative value = no notifications are allowed.")

if l.NotificationRateLimitPerIntegration == nil {
l.NotificationRateLimitPerIntegration = NotificationRateLimitMap{}
}
f.Var(&l.NotificationRateLimitPerIntegration, "alertmanager.notification-rate-limit-per-integration", "Per-integration notification rate limits. Value is a map, where each key is integration name and value is a rate-limit (float). On command line, this map is given in JSON format. Rate limit has the same meaning as -alertmanager.notification-rate-limit, but only applies for specific integration. Allowed integration names: "+strings.Join(allowedIntegrationNames, ", ")+".")
}

// Validate the limits config and returns an error if the validation
Expand All @@ -190,6 +196,8 @@ func (l *Limits) UnmarshalYAML(unmarshal func(interface{}) error) error {
// During startup we wont have a default value so we don't want to overwrite them
if defaultLimits != nil {
*l = *defaultLimits
// Make copy of default limits. Otherwise unmarshalling would modify map in default limits.
l.copyNotificationIntegrationLimits(defaultLimits.NotificationRateLimitPerIntegration)
}
type plain Limits
return unmarshal((*plain)(l))
Expand All @@ -202,12 +210,21 @@ func (l *Limits) UnmarshalJSON(data []byte) error {
// behind type indirection.
if defaultLimits != nil {
*l = *defaultLimits
// Make copy of default limits. Otherwise unmarshalling would modify map in default limits.
l.copyNotificationIntegrationLimits(defaultLimits.NotificationRateLimitPerIntegration)
}

type plain Limits
return json.Unmarshal(data, (*plain)(l))
}

func (l *Limits) copyNotificationIntegrationLimits(defaults NotificationRateLimitMap) {
l.NotificationRateLimitPerIntegration = make(map[string]float64, len(defaults))
for k, v := range defaults {
l.NotificationRateLimitPerIntegration[k] = v
}
}

// When we load YAML from disk, we want the various per-customer limits
// to default to any values specified on the command line, not default
// command line values. This global contains those values. I (Tom) cannot
Expand Down Expand Up @@ -508,24 +525,52 @@ func (o *Overrides) AlertmanagerReceiversBlockPrivateAddresses(user string) bool
return o.getOverridesForUser(user).AlertmanagerReceiversBlockPrivateAddresses
}

func (o *Overrides) EmailNotificationRateLimit(user string) rate.Limit {
l := o.getOverridesForUser(user).EmailNotificationRateLimit
// Notification limits are special. Limits are returned in following order:
// 1. per-tenant limits for given integration
// 2. default limits for given integration
// 3. per-tenant limits
// 4. default limits
func (o *Overrides) getNotificationLimitForUser(user, integration string) float64 {
u := o.getOverridesForUser(user)
if n, ok := u.NotificationRateLimitPerIntegration[integration]; ok {
return n
}

return u.NotificationRateLimit
}

func (o *Overrides) NotificationRateLimit(user string, integration string) rate.Limit {
l := o.getNotificationLimitForUser(user, integration)
if l == 0 || math.IsInf(l, 1) {
return rate.Inf // No rate limit.
}

if l < 0 {
l = 0 // No emails will be sent.
l = 0 // No notifications will be sent.
}
return rate.Limit(l)
}

func (o *Overrides) EmailNotificationBurst(user string) int {
b := o.getOverridesForUser(user).EmailNotificationBurstSize
if b < 0 {
b = 0
const maxInt = int(^uint(0) >> 1)

func (o *Overrides) NotificationBurstSize(user string, integration string) int {
// Burst size is computed from rate limit. Rate limit is already normalized to [0, +inf), where 0 means disabled.
l := o.NotificationRateLimit(user, integration)
if l == 0 {
return 0
}

// floats can be larger than max int. This also handles case where l == rate.Inf.
if float64(l) >= float64(maxInt) {
return maxInt
}

// For values between (0, 1), allow single notification per second (every 1/limit seconds).
if l < 1 {
return 1
}
return b

return int(l)
}

func (o *Overrides) getOverridesForUser(userID string) *Limits {
Expand Down
Loading