Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add firewall support to http based alertmanager receiver integrations #4085

Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
* [ENHANCEMENT] Ruler: Added `-ruler.enabled-tenants` and `-ruler.disabled-tenants` to explicitly enable or disable rules processing for specific tenants. #4074
* [ENHANCEMENT] Block Storage Ingester: `/flush` now accepts two new parameters: `tenant` to specify tenant to flush and `wait=true` to make call synchronous. Multiple tenants can be specified by repeating `tenant` parameter. If no `tenant` is specified, all tenants are flushed, as before. #4073
* [ENHANCEMENT] Alertmanager: validate configured `-alertmanager.web.external-url` and fail if ends with `/`. #4081
* [ENHANCEMENT] Alertmanager: added `-alertmanager.receivers-firewall.block.cidr-networks` and `-alertmanager.receivers-firewall.block.private-addresses` to block specific network addresses in HTTP-based Alertmanager receiver integrations. #4085
* [ENHANCEMENT] Allow configuration of Cassandra's host selection policy. #4069
* [ENHANCEMENT] Store-gateway: retry synching blocks if a per-tenant sync fails. #3975 #4088
* [ENHANCEMENT] Add metric `cortex_tcp_connections` exposing the current number of accepted TCP connections. #4099
Expand Down
11 changes: 11 additions & 0 deletions docs/blocks-storage/production-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,3 +105,14 @@ You can see that the initial migration is done by looking for the following mess
The rule of thumb to ensure memcached is properly scaled is to make sure evictions happen infrequently. When that's not the case and they affect query performances, the suggestion is to scale out the memcached cluster adding more nodes or increasing the memory limit of existing ones.

We also recommend to run a different memcached cluster for each cache type (metadata, index, chunks). It's not required, but suggested to not worry about the effect of memory pressure on a cache type against others.

## Alertmanager

### Ensure Alertmanager networking is hardened

If the Alertmanager API is enabled, users with access to Cortex can autonomously configure the Alertmanager, including receiver integrations that allow to issue network requests to the configured URL (eg. webhook). If the Alertmanager network is not hardened, Cortex users may have the ability to issue network requests to any network endpoint including services running in the local network accessible by the Alertmanager itself.

Despite hardening the system is out of the scope of Cortex, Cortex provides a basic built-in firewall to block connections created by Alertmanager receiver integrations:

- `-alertmanager.receivers-firewall.block.cidr-networks`
- `-alertmanager.receivers-firewall.block.private-addresses`
12 changes: 12 additions & 0 deletions docs/configuration/config-file-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -1849,6 +1849,18 @@ The `alertmanager_config` configures the Cortex alertmanager.
# CLI flag: -alertmanager.max-recv-msg-size
[max_recv_msg_size: <int> | default = 16777216]

receivers_firewall:
block:
# Comma-separated list of network CIDRs to block in Alertmanager receiver
# integrations.
# CLI flag: -alertmanager.receivers-firewall.block.cidr-networks
[cidr_networks: <string> | default = ""]

# True to block private and local addresses in Alertmanager receiver
# integrations.
# CLI flag: -alertmanager.receivers-firewall.block.private-addresses
[private_addresses: <boolean> | default = false]

# Shard tenants across multiple alertmanager instances.
# CLI flag: -alertmanager.sharding-enabled
[sharding_enabled: <boolean> | default = false]
Expand Down
6 changes: 4 additions & 2 deletions docs/configuration/v1-guarantees.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,10 @@ Currently experimental features are:
- Azure blob storage.
- Zone awareness based replication.
- Ruler API (to PUT rules).
- Alertmanager API
- Alertmanager:
- API (enabled via `-experimental.alertmanager.enable-api`)
- Sharding of tenants across multiple instances (enabled via `-alertmanager.sharding-enabled`)
- Receiver integrations firewall (configured via `-alertmanager.receivers-firewall.*`)
- Memcached client DNS-based service discovery.
- Delete series APIs.
- In-memory (FIFO) and Redis cache.
Expand All @@ -61,7 +64,6 @@ Currently experimental features are:
- The bucket index support in the querier and store-gateway (enabled via `-blocks-storage.bucket-store.bucket-index.enabled=true`) is experimental
- The block deletion marks migration support in the compactor (`-compactor.block-deletion-marks-migration-enabled`) is temporarily and will be removed in future versions
- Querier: tenant federation
- Alertmanager: Sharding of tenants across multiple instances
- The thanosconvert tool for converting Thanos block metadata to Cortex
- HA Tracker: cleanup of old replicas from KV Store.
- Flags for configuring whether blocks-ingester streams samples or chunks are temporary, and will be removed when feature is tested:
Expand Down
47 changes: 30 additions & 17 deletions pkg/alertmanager/alertmanager.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,12 @@ import (
"github.com/prometheus/alertmanager/ui"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
commoncfg "github.com/prometheus/common/config"
"github.com/prometheus/common/model"
"github.com/prometheus/common/route"

"github.com/cortexproject/cortex/pkg/alertmanager/alertstore"
util_net "github.com/cortexproject/cortex/pkg/util/net"
"github.com/cortexproject/cortex/pkg/util/services"
)

Expand All @@ -59,12 +61,13 @@ const (

// Config configures an Alertmanager.
type Config struct {
UserID string
Logger log.Logger
Peer *cluster.Peer
PeerTimeout time.Duration
Retention time.Duration
ExternalURL *url.URL
UserID string
Logger log.Logger
Peer *cluster.Peer
PeerTimeout time.Duration
Retention time.Duration
ExternalURL *url.URL
ReceiversFirewall FirewallConfig

// Tenant-specific local directory where AM can store its state (notifications, silences, templates). When AM is stopped, entire dir is removed.
TenantDataDir string
Expand Down Expand Up @@ -94,6 +97,7 @@ type Alertmanager struct {
wg sync.WaitGroup
mux *http.ServeMux
registry *prometheus.Registry
firewallDialer *util_net.FirewallDialer

// The Dispatcher is the only component we need to recreate when we call ApplyConfig.
// Given its metrics don't have any variable labels we need to re-use the same metrics.
Expand Down Expand Up @@ -147,6 +151,10 @@ func New(cfg *Config, reg *prometheus.Registry) (*Alertmanager, error) {
cfg: cfg,
logger: log.With(cfg.Logger, "user", cfg.UserID),
stop: make(chan struct{}),
firewallDialer: util_net.NewFirewallDialer(util_net.FirewallDialerConfig{
BlockCIDRNetworks: cfg.ReceiversFirewall.Block.CIDRNetworks,
BlockPrivateAddresses: cfg.ReceiversFirewall.Block.PrivateAddresses,
}),
configHashMetric: promauto.With(reg).NewGauge(prometheus.GaugeOpts{
Name: "alertmanager_config_hash",
Help: "Hash of the currently loaded alertmanager configuration.",
Expand Down Expand Up @@ -315,7 +323,7 @@ func (am *Alertmanager) ApplyConfig(userID string, conf *config.Config, rawCfg s
return d + waitFunc()
}

integrationsMap, err := buildIntegrationsMap(conf.Receivers, tmpl, am.logger)
integrationsMap, err := buildIntegrationsMap(conf.Receivers, tmpl, am.firewallDialer, am.logger)
if err != nil {
return nil
}
Expand Down Expand Up @@ -407,10 +415,10 @@ func (am *Alertmanager) getFullState() (*clusterpb.FullState, error) {

// buildIntegrationsMap builds a map of name to the list of integration notifiers off of a
// list of receiver config.
func buildIntegrationsMap(nc []*config.Receiver, tmpl *template.Template, logger log.Logger) (map[string][]notify.Integration, error) {
func buildIntegrationsMap(nc []*config.Receiver, tmpl *template.Template, firewallDialer *util_net.FirewallDialer, logger log.Logger) (map[string][]notify.Integration, error) {
integrationsMap := make(map[string][]notify.Integration, len(nc))
for _, rcv := range nc {
integrations, err := buildReceiverIntegrations(rcv, tmpl, logger)
integrations, err := buildReceiverIntegrations(rcv, tmpl, firewallDialer, logger)
if err != nil {
return nil, err
}
Expand All @@ -422,7 +430,7 @@ func buildIntegrationsMap(nc []*config.Receiver, tmpl *template.Template, logger
// buildReceiverIntegrations builds a list of integration notifiers off of a
// receiver config.
// Taken from https://github.com/prometheus/alertmanager/blob/94d875f1227b29abece661db1a68c001122d1da5/cmd/alertmanager/main.go#L112-L159.
func buildReceiverIntegrations(nc *config.Receiver, tmpl *template.Template, logger log.Logger) ([]notify.Integration, error) {
func buildReceiverIntegrations(nc *config.Receiver, tmpl *template.Template, firewallDialer *util_net.FirewallDialer, logger log.Logger) ([]notify.Integration, error) {
var (
errs types.MultiError
integrations []notify.Integration
Expand All @@ -436,29 +444,34 @@ func buildReceiverIntegrations(nc *config.Receiver, tmpl *template.Template, log
}
)

// Inject the firewall to any receiver integration supporting it.
httpOps := []commoncfg.HTTPClientOption{
commoncfg.WithDialContextFunc(firewallDialer.DialContext),
}

for i, c := range nc.WebhookConfigs {
add("webhook", i, c, func(l log.Logger) (notify.Notifier, error) { return webhook.New(c, tmpl, l) })
add("webhook", i, c, func(l log.Logger) (notify.Notifier, error) { return webhook.New(c, tmpl, l, httpOps...) })
}
for i, c := range nc.EmailConfigs {
add("email", i, c, func(l log.Logger) (notify.Notifier, error) { return email.New(c, tmpl, l), nil })
}
for i, c := range nc.PagerdutyConfigs {
add("pagerduty", i, c, func(l log.Logger) (notify.Notifier, error) { return pagerduty.New(c, tmpl, l) })
add("pagerduty", i, c, func(l log.Logger) (notify.Notifier, error) { return pagerduty.New(c, tmpl, l, httpOps...) })
}
for i, c := range nc.OpsGenieConfigs {
add("opsgenie", i, c, func(l log.Logger) (notify.Notifier, error) { return opsgenie.New(c, tmpl, l) })
add("opsgenie", i, c, func(l log.Logger) (notify.Notifier, error) { return opsgenie.New(c, tmpl, l, httpOps...) })
}
for i, c := range nc.WechatConfigs {
add("wechat", i, c, func(l log.Logger) (notify.Notifier, error) { return wechat.New(c, tmpl, l) })
add("wechat", i, c, func(l log.Logger) (notify.Notifier, error) { return wechat.New(c, tmpl, l, httpOps...) })
}
for i, c := range nc.SlackConfigs {
add("slack", i, c, func(l log.Logger) (notify.Notifier, error) { return slack.New(c, tmpl, l) })
add("slack", i, c, func(l log.Logger) (notify.Notifier, error) { return slack.New(c, tmpl, l, httpOps...) })
}
for i, c := range nc.VictorOpsConfigs {
add("victorops", i, c, func(l log.Logger) (notify.Notifier, error) { return victorops.New(c, tmpl, l) })
add("victorops", i, c, func(l log.Logger) (notify.Notifier, error) { return victorops.New(c, tmpl, l, httpOps...) })
}
for i, c := range nc.PushoverConfigs {
add("pushover", i, c, func(l log.Logger) (notify.Notifier, error) { return pushover.New(c, tmpl, l) })
add("pushover", i, c, func(l log.Logger) (notify.Notifier, error) { return pushover.New(c, tmpl, l, httpOps...) })
}
if errs.Len() > 0 {
return nil, &errs
Expand Down
26 changes: 26 additions & 0 deletions pkg/alertmanager/firewall.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
package alertmanager

import (
"flag"
"fmt"

"github.com/cortexproject/cortex/pkg/util/flagext"
)

type FirewallConfig struct {
Block FirewallHostsSpec `yaml:"block"`
}

func (cfg *FirewallConfig) RegisterFlagsWithPrefix(prefix string, f *flag.FlagSet) {
cfg.Block.RegisterFlagsWithPrefix(prefix+".block", "block", f)
}

type FirewallHostsSpec struct {
CIDRNetworks flagext.CIDRSliceCSV `yaml:"cidr_networks"`
PrivateAddresses bool `yaml:"private_addresses"`
}

func (cfg *FirewallHostsSpec) RegisterFlagsWithPrefix(prefix, action string, f *flag.FlagSet) {
f.Var(&cfg.CIDRNetworks, prefix+".cidr-networks", fmt.Sprintf("Comma-separated list of network CIDRs to %s in Alertmanager receiver integrations.", action))
f.BoolVar(&cfg.PrivateAddresses, prefix+".private-addresses", false, fmt.Sprintf("True to %s private and local addresses in Alertmanager receiver integrations.", action))
}
15 changes: 8 additions & 7 deletions pkg/alertmanager/multitenant.go
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,12 @@ func init() {

// MultitenantAlertmanagerConfig is the configuration for a multitenant Alertmanager.
type MultitenantAlertmanagerConfig struct {
DataDir string `yaml:"data_dir"`
Retention time.Duration `yaml:"retention"`
ExternalURL flagext.URLValue `yaml:"external_url"`
PollInterval time.Duration `yaml:"poll_interval"`
MaxRecvMsgSize int64 `yaml:"max_recv_msg_size"`
DataDir string `yaml:"data_dir"`
Retention time.Duration `yaml:"retention"`
ExternalURL flagext.URLValue `yaml:"external_url"`
PollInterval time.Duration `yaml:"poll_interval"`
MaxRecvMsgSize int64 `yaml:"max_recv_msg_size"`
ReceiversFirewall FirewallConfig `yaml:"receivers_firewall"`

// Enable sharding for the Alertmanager
ShardingEnabled bool `yaml:"sharding_enabled"`
Expand Down Expand Up @@ -158,9 +159,8 @@ func (cfg *MultitenantAlertmanagerConfig) RegisterFlags(f *flag.FlagSet) {
f.BoolVar(&cfg.ShardingEnabled, "alertmanager.sharding-enabled", false, "Shard tenants across multiple alertmanager instances.")

cfg.AlertmanagerClient.RegisterFlagsWithPrefix("alertmanager.alertmanager-client", f)

cfg.Persister.RegisterFlagsWithPrefix("alertmanager", f)

cfg.ReceiversFirewall.RegisterFlagsWithPrefix("alertmanager.receivers-firewall", f)
cfg.ShardingRing.RegisterFlags(f)
cfg.Store.RegisterFlags(f)
cfg.Cluster.RegisterFlags(f)
Expand Down Expand Up @@ -873,6 +873,7 @@ func (am *MultitenantAlertmanager) newAlertmanager(userID string, amConfig *amco
ReplicationFactor: am.cfg.ShardingRing.ReplicationFactor,
Store: am.store,
PersisterConfig: am.cfg.Persister,
ReceiversFirewall: am.cfg.ReceiversFirewall,
}, reg)
if err != nil {
return nil, fmt.Errorf("unable to start Alertmanager for user %v: %v", userID, err)
Expand Down
Loading