Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to override Alertmanager receivers firewall settings on a per-tenant basis #4143

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
## master / unreleased

* [CHANGE] Querier / ruler: deprecated `-store.query-chunk-limit` CLI flag (and its respective YAML config option `max_chunks_per_query`) in favour of `-querier.max-fetched-chunks-per-query` (and its respective YAML config option `max_fetched_chunks_per_query`). The new limit specifies the maximum number of chunks that can be fetched in a single query from ingesters and long-term storage: the total number of actual fetched chunks could be 2x the limit, being independently applied when querying ingesters and long-term storage. #4125
* [CHANGE] Alertmanager: allowed to configure the experimental receivers firewall on a per-tenant basis. The following CLI flags (and their respective YAML config options) have been changed and moved to the limits config section: #4143
- `-alertmanager.receivers-firewall.block.cidr-networks` renamed to `-alertmanager.receivers-firewall-block-cidr-networks`
- `-alertmanager.receivers-firewall.block.private-addresses` renamed to `-alertmanager.receivers-firewall-block-private-addresses`

## 1.9.0 in progress

Expand Down
6 changes: 4 additions & 2 deletions docs/blocks-storage/production-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,5 +114,7 @@ If the Alertmanager API is enabled, users with access to Cortex can autonomously

Despite hardening the system is out of the scope of Cortex, Cortex provides a basic built-in firewall to block connections created by Alertmanager receiver integrations:

- `-alertmanager.receivers-firewall.block.cidr-networks`
- `-alertmanager.receivers-firewall.block.private-addresses`
- `-alertmanager.receivers-firewall-block-cidr-networks`
- `-alertmanager.receivers-firewall-block-private-addresses`

_These settings can also be overridden on a per-tenant basis via overrides specified in the [runtime config](../configuration/arguments.md#runtime-configuration-file)._
26 changes: 12 additions & 14 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1849,20 +1849,6 @@ The `alertmanager_config` configures the Cortex alertmanager.
# CLI flag: -alertmanager.max-recv-msg-size
[max_recv_msg_size: <int> | default = 16777216]

receivers_firewall:
block:
# Comma-separated list of network CIDRs to block in Alertmanager receiver
# integrations.
# CLI flag: -alertmanager.receivers-firewall.block.cidr-networks
[cidr_networks: <string> | default = ""]

# True to block private and local addresses in Alertmanager receiver
# integrations. It blocks private addresses defined by RFC 1918 (IPv4
# addresses) and RFC 4193 (IPv6 addresses), as well as loopback, local
# unicast and local multicast addresses.
# CLI flag: -alertmanager.receivers-firewall.block.private-addresses
[private_addresses: <boolean> | default = false]

# Shard tenants across multiple alertmanager instances.
# CLI flag: -alertmanager.sharding-enabled
[sharding_enabled: <boolean> | default = false]
Expand Down Expand Up @@ -4108,6 +4094,18 @@ The `limits_config` configures default and per-tenant limits imposed by Cortex s
# override is set, the encryption context will not be provided to S3. Ignored if
# the SSE type override is not set.
[s3_sse_kms_encryption_context: <string> | default = ""]

# Comma-separated list of network CIDRs to block in Alertmanager receiver
# integrations.
# CLI flag: -alertmanager.receivers-firewall-block-cidr-networks
[alertmanager_receivers_firewall_block_cidr_networks: <string> | default = ""]

# True to block private and local addresses in Alertmanager receiver
# integrations. It blocks private addresses defined by RFC 1918 (IPv4
# addresses) and RFC 4193 (IPv6 addresses), as well as loopback, local unicast
# and local multicast addresses.
# CLI flag: -alertmanager.receivers-firewall-block-private-addresses
[alertmanager_receivers_firewall_block_private_addresses: <boolean> | default = false]
```

### `redis_config`
Expand Down
45 changes: 32 additions & 13 deletions pkg/alertmanager/alertmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ import (
"github.com/prometheus/common/route"

"github.com/cortexproject/cortex/pkg/alertmanager/alertstore"
"github.com/cortexproject/cortex/pkg/util/flagext"
util_net "github.com/cortexproject/cortex/pkg/util/net"
"github.com/cortexproject/cortex/pkg/util/services"
)
Expand All @@ -61,13 +62,13 @@ const (

// Config configures an Alertmanager.
type Config struct {
UserID string
Logger log.Logger
Peer *cluster.Peer
PeerTimeout time.Duration
Retention time.Duration
ExternalURL *url.URL
ReceiversFirewall FirewallConfig
UserID string
Logger log.Logger
Peer *cluster.Peer
PeerTimeout time.Duration
Retention time.Duration
ExternalURL *url.URL
Limits Limits

// Tenant-specific local directory where AM can store its state (notifications, silences, templates). When AM is stopped, entire dir is removed.
TenantDataDir string
Expand Down Expand Up @@ -97,7 +98,6 @@ type Alertmanager struct {
wg sync.WaitGroup
mux *http.ServeMux
registry *prometheus.Registry
firewallDialer *util_net.FirewallDialer

// The Dispatcher is the only component we need to recreate when we call ApplyConfig.
// Given its metrics don't have any variable labels we need to re-use the same metrics.
Expand Down Expand Up @@ -151,10 +151,6 @@ func New(cfg *Config, reg *prometheus.Registry) (*Alertmanager, error) {
cfg: cfg,
logger: log.With(cfg.Logger, "user", cfg.UserID),
stop: make(chan struct{}),
firewallDialer: util_net.NewFirewallDialer(util_net.FirewallDialerConfig{
BlockCIDRNetworks: cfg.ReceiversFirewall.Block.CIDRNetworks,
BlockPrivateAddresses: cfg.ReceiversFirewall.Block.PrivateAddresses,
}),
configHashMetric: promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "alertmanager_config_hash",
Help: "Hash of the currently loaded alertmanager configuration.",
Expand Down Expand Up @@ -326,7 +322,10 @@ func (am *Alertmanager) ApplyConfig(userID string, conf *config.Config, rawCfg s
return d + waitFunc()
}

integrationsMap, err := buildIntegrationsMap(conf.Receivers, tmpl, am.firewallDialer, am.logger)
// Create a firewall binded to the per-tenant config.
firewallDialer := util_net.NewFirewallDialer(newFirewallDialerConfigProvider(userID, am.cfg.Limits))

integrationsMap, err := buildIntegrationsMap(conf.Receivers, tmpl, firewallDialer, am.logger)
if err != nil {
return nil
}
Expand Down Expand Up @@ -507,3 +506,23 @@ func (p *NilPeer) AddState(string, cluster.State, prometheus.Registerer) cluster
type NilChannel struct{}

func (c *NilChannel) Broadcast([]byte) {}

type firewallDialerConfigProvider struct {
userID string
limits Limits
}

func newFirewallDialerConfigProvider(userID string, limits Limits) firewallDialerConfigProvider {
return firewallDialerConfigProvider{
userID: userID,
limits: limits,
}
}

func (p firewallDialerConfigProvider) BlockCIDRNetworks() []flagext.CIDR {
return p.limits.AlertmanagerReceiversBlockCIDRNetworks(p.userID)
}

func (p firewallDialerConfigProvider) BlockPrivateAddresses() bool {
return p.limits.AlertmanagerReceiversBlockPrivateAddresses(p.userID)
}
2 changes: 1 addition & 1 deletion pkg/alertmanager/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -546,7 +546,7 @@ receivers:
// Create the Multitenant Alertmanager.
reg := prometheus.NewPedanticRegistry()
cfg := mockAlertmanagerConfig(t)
am, err := createMultitenantAlertmanager(cfg, nil, nil, alertStore, nil, log.NewNopLogger(), reg)
am, err := createMultitenantAlertmanager(cfg, nil, nil, alertStore, nil, nil, log.NewNopLogger(), reg)
require.NoError(t, err)
require.NoError(t, services.StartAndAwaitRunning(context.Background(), am))
defer services.StopAndAwaitTerminated(context.Background(), am) //nolint:errcheck
Expand Down
26 changes: 0 additions & 26 deletions pkg/alertmanager/firewall.go

This file was deleted.

34 changes: 23 additions & 11 deletions pkg/alertmanager/multitenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -101,12 +101,11 @@ func init() {

// MultitenantAlertmanagerConfig is the configuration for a multitenant Alertmanager.
type MultitenantAlertmanagerConfig struct {
DataDir string `yaml:"data_dir"`
Retention time.Duration `yaml:"retention"`
ExternalURL flagext.URLValue `yaml:"external_url"`
PollInterval time.Duration `yaml:"poll_interval"`
MaxRecvMsgSize int64 `yaml:"max_recv_msg_size"`
ReceiversFirewall FirewallConfig `yaml:"receivers_firewall"`
DataDir string `yaml:"data_dir"`
Retention time.Duration `yaml:"retention"`
ExternalURL flagext.URLValue `yaml:"external_url"`
PollInterval time.Duration `yaml:"poll_interval"`
MaxRecvMsgSize int64 `yaml:"max_recv_msg_size"`

// Enable sharding for the Alertmanager
ShardingEnabled bool `yaml:"sharding_enabled"`
Expand Down Expand Up @@ -159,7 +158,6 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet) {

cfg.AlertmanagerClient.RegisterFlagsWithPrefix("alertmanager.alertmanager-client", f)
cfg.Persister.RegisterFlagsWithPrefix("alertmanager", f)
cfg.ReceiversFirewall.RegisterFlagsWithPrefix("alertmanager.receivers-firewall", f)
cfg.ShardingRing.RegisterFlags(f)
cfg.Store.RegisterFlags(f)
cfg.Cluster.RegisterFlags(f)
Expand Down Expand Up @@ -215,6 +213,17 @@ func newMultitenantAlertmanagerMetrics(reg prometheus.Registerer) *multitenantAl
return m
}

// Limits defines limits used by Alertmanager.
type Limits interface {
// AlertmanagerReceiversBlockCIDRNetworks returns the list of network CIDRs that should be blocked
// in the Alertmanager receivers for the given user.
AlertmanagerReceiversBlockCIDRNetworks(user string) []flagext.CIDR

// AlertmanagerReceiversBlockPrivateAddresses returns true if private addresses should be blocked
// in the Alertmanager receivers for the given user.
AlertmanagerReceiversBlockPrivateAddresses(user string) bool
}

// A MultitenantAlertmanager manages Alertmanager instances for multiple
// organizations.
type MultitenantAlertmanager struct {
Expand Down Expand Up @@ -257,6 +266,8 @@ type MultitenantAlertmanager struct {
peer *cluster.Peer
alertmanagerClientsPool ClientsPool

limits Limits

registry prometheus.Registerer
ringCheckErrors prometheus.Counter
tenantsOwned prometheus.Gauge
Expand All @@ -266,7 +277,7 @@ type MultitenantAlertmanager struct {
}

// NewMultitenantAlertmanager creates a new MultitenantAlertmanager.
func NewMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, store alertstore.AlertStore, logger log.Logger, registerer prometheus.Registerer) (*MultitenantAlertmanager, error) {
func NewMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, store alertstore.AlertStore, limits Limits, logger log.Logger, registerer prometheus.Registerer) (*MultitenantAlertmanager, error) {
err := os.MkdirAll(cfg.DataDir, 0777)
if err != nil {
return nil, fmt.Errorf("unable to create Alertmanager data directory %q: %s", cfg.DataDir, err)
Expand Down Expand Up @@ -326,10 +337,10 @@ func NewMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, store alerts
}
}

return createMultitenantAlertmanager(cfg, fallbackConfig, peer, store, ringStore, logger, registerer)
return createMultitenantAlertmanager(cfg, fallbackConfig, peer, store, ringStore, limits, logger, registerer)
}

func createMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, fallbackConfig []byte, peer *cluster.Peer, store alertstore.AlertStore, ringStore kv.Client, logger log.Logger, registerer prometheus.Registerer) (*MultitenantAlertmanager, error) {
func createMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, fallbackConfig []byte, peer *cluster.Peer, store alertstore.AlertStore, ringStore kv.Client, limits Limits, logger log.Logger, registerer prometheus.Registerer) (*MultitenantAlertmanager, error) {
am := &MultitenantAlertmanager{
cfg: cfg,
fallbackConfig: string(fallbackConfig),
Expand All @@ -341,6 +352,7 @@ func createMultitenantAlertmanager(cfg *MultitenantAlertmanagerConfig, fallbackC
store: store,
logger: log.With(logger, "component", "MultiTenantAlertmanager"),
registry: registerer,
limits: limits,
ringCheckErrors: promauto.With(registerer).NewCounter(prometheus.CounterOpts{
Name: "cortex_alertmanager_ring_check_errors_total",
Help: "Number of errors that have occurred when checking the ring for ownership.",
Expand Down Expand Up @@ -877,7 +889,7 @@ func (am *MultitenantAlertmanager) newAlertmanager(userID string, amConfig *amco
ReplicationFactor: am.cfg.ShardingRing.ReplicationFactor,
Store: am.store,
PersisterConfig: am.cfg.Persister,
ReceiversFirewall: am.cfg.ReceiversFirewall,
Limits: am.limits,
}, reg)
if err != nil {
return nil, fmt.Errorf("unable to start Alertmanager for user %v: %v", userID, err)
Expand Down
Loading