Skip to content

Listen for AlertManagerConfig CRDs and configure Alertmanager in Grafana Mimir/Cortex #504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nicoche opened this issue Jan 14, 2023 · 16 comments · May be fixed by #3448
Open

Listen for AlertManagerConfig CRDs and configure Alertmanager in Grafana Mimir/Cortex #504

nicoche opened this issue Jan 14, 2023 · 16 comments · May be fixed by #3448
Assignees
Labels
enhancement New feature or request

Comments

@nicoche
Copy link

nicoche commented Jan 14, 2023

Similarly to grafana/agent#1544, the agent could discover AlertManagerConfig CRDs to configure mimir's alert manager via its API

@tpaschalis
Copy link
Member

Hey there! 👋 Yes, that sounds like an interesting idea!

Would you like to take a shot at creating an Agent Flow component for it? We can provide guidance around it.

@nicoche
Copy link
Author

nicoche commented Jan 27, 2023

Hey! I've been planning to 🙂 However, I haven't been able to find the time yet -I'll try to in the coming weeks.

Thanks for the proposition to help! I believe that we can do something similar to grafana/agent#2604

@ilia-medvedev-codefresh
Copy link

This would be so useful to us!

@rafilkmp3
Copy link

I would like to see an example of that using grafana agent operator CRDs

@rafilkmp3
Copy link

@tpaschalis, can you give me some guidance? I have a lot of prometheus rules in my cluster and since I moved to prometheus-agent I lost all my alerting system since mimir ruler does not use rules for nothing

@tpaschalis
Copy link
Member

@rafilkmp3 Apologies I totally missed your initial message. In case you're still interested, you can take some inspiration from grafana/agent#2604; implementing a new mimir.alertmanager.kubernetes component.

@LeszekBlazewski
Copy link

I just wanted to bump this issue since the feature would be an absolute bomber. I did not miss any other implementations that would solve this request, right?

Thanks for the amazing tools you guys develop!

@bcrisp4
Copy link

bcrisp4 commented Mar 12, 2024

Another +1 from me! The integration with Mimir Ruler is super useful! Thanks for implementing that 😊 Alertmanager is the final piece of the puzzle!

@reda-ayoub
Copy link

+1 for this feature

@rfratto rfratto transferred this issue from grafana/agent Apr 11, 2024
@jeremych1000
Copy link

Could we help get this over the line please? :)
So close

@Daniel-Vaz
Copy link

Plus one for me here also 🤞

@fculpo
Copy link

fculpo commented Jul 18, 2024

+1

How are you managing this right now ?
We are migrating to Alloy with loki and mimir kubernetes rules components to replicate kube-prometheus (prometheus-operator), that is working fine and all is visible in Grafana, but the AM config is lacking (mainly configuring routing to Grafana Oncall integrations depending on labels, etc.)

@lieberlois
Copy link

+1

Any updates on this? If you need support with the implementation I'd be happy to help if I find the time - just would need some pointers on how to get started ^^

@ptodev ptodev self-assigned this Mar 3, 2025
@ptodev
Copy link
Collaborator

ptodev commented Mar 7, 2025

I had a look at implementing this, but just like the related proposal states I personally don't know how to merge the AlertmanagerConfig CRDs. For example:

  • If two CRDs have receivers with the same name, should they be merged together? Or should we change their names?
  • How would we merge route and inhibitRules? They don't even have name identifiers.
  • What if two overlapping CRDs got merged, but then one of them got deleted? We need to prevent the deleted one from deleting things that the non-deleted one still has.

I suppose we'd have to store all active CRDs in memory and the re-merge all of them any time a CRD is added, modified or deleted. That way we can avoid issues with deleted CRDs affecting the config of non-deleted CRDs.

I'm still not sure how the merging should actually be done though. I'm quite new to this topic, so I'd be grateful if anyone with more real world experience would like to shed light on it :)

@pdf
Copy link

pdf commented Mar 7, 2025

I think the way to go for Alertmanager would be to crib the prometheus-operator config generation code, then load the resulting complete config into mimir as a single op, similarly to how mimirtool would be used to load an existing alertmanager configuration file.

@ptodev
Copy link
Collaborator

ptodev commented Apr 22, 2025

Would it be ok if the Alloy component does a few things differently from the operator:

  • The global Alertmanager config would come from a string instead of the Alertmanager CRD. That string could be set in Alloy's config directly or it could come from a component like remote.kubernetes.configmap. Even if we use a CRD, most of the parameters in it would be unused.
  • The template_files option in the Mimir API would also be set via a string in the Alloy component's config.

I've been testing a component locally with the following inputs:

Alloy config
remote.kubernetes.configmap "default" {
    namespace = "default"
    name = "alertmgr-global"
}

mimir.alerts.kubernetes "default" {
    address = "http://mimir-nginx.mimir-test.svc:80"
    global_config = remote.kubernetes.configmap.default.data["glbl"]
}
Global Alertmanager config
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmgr-global
  namespace: default
  labels:
    agent: "yes"
data:
  glbl: |
    global:
      resolve_timeout: 5m
      http_config:
        follow_redirects: true
        enable_http2: true
      smtp_hello: localhost
      smtp_require_tls: true
    route:
      receiver: "null"
    receivers:
    - name: "null"
    - name: "alloy-namespace/global-config/myreceiver"
    templates: []
AlertmanagerConfig CRD 1
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alertmgr-config1
  namespace: default
  labels:
    agent: "yes"
spec:
  route:
    receiver: "null"
    routes:
    - receiver: myamc
      continue: true
  receivers:
  - name: "null"
  - name: myamc
    webhookConfigs:
    - url: http://test.url
      httpConfig:
        followRedirects: true
AlertmanagerConfig CRD 2
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: alertmgr-config2
  namespace: default
  labels:
    agent: "yes"
spec:
  route:
    receiver: "null"
    routes:
    - receiver: 'database-pager'
      group_wait: 10s
      matchers:
      - name: service
        value: webapp
  receivers:
  - name: "null"
  - name: "database-pager"

When I query curl http://mimir-nginx.mimir-test.svc:80/api/v1/alerts, I get this output:

Output config
template_files: {}
alertmanager_config: |
    global:
      resolve_timeout: 5m
      http_config:
        follow_redirects: true
        enable_http2: true
      smtp_hello: localhost
      smtp_require_tls: true
    route:
      receiver: "null"
      continue: false
      routes:
      - receiver: default/alertmgr-config1/null
        matchers:
        - namespace="default"
        continue: true
        routes:
        - receiver: default/alertmgr-config1/myamc
          continue: true
      - receiver: default/alertmgr-config2/null
        matchers:
        - namespace="default"
        continue: true
        routes:
        - receiver: default/alertmgr-config2/database-pager
          match:
            service: webapp
          continue: false
    receivers:
    - name: "null"
    - name: alloy-namespace/global-config/myreceiver
    - name: default/alertmgr-config1/null
    - name: default/alertmgr-config1/myamc
      webhook_configs:
      - send_resolved: false
        http_config:
          follow_redirects: true
          enable_http2: true
        url: <secret>
        url_file: ""
        max_alerts: 0
    - name: default/alertmgr-config2/null
    - name: default/alertmgr-config2/database-pager
    templates: []

When I remove the alertmgr-config1 CRD, the Mimir config becomes:

Output config 2
template_files: {}
alertmanager_config: |
    global:
      resolve_timeout: 5m
      http_config:
        follow_redirects: true
        enable_http2: true
      smtp_hello: localhost
      smtp_require_tls: true
    route:
      receiver: "null"
      continue: false
      routes:
      - receiver: default/alertmgr-config2/null
        matchers:
        - namespace="default"
        continue: true
        routes:
        - receiver: default/alertmgr-config2/database-pager
          match:
            service: webapp
          continue: false
    receivers:
    - name: "null"
    - name: alloy-namespace/global-config/myreceiver
    - name: default/alertmgr-config2/null
    - name: default/alertmgr-config2/database-pager
    templates: []

I need to add a few more tests and I will open a PR to add it in as an experimental component.

@ptodev ptodev linked a pull request Apr 25, 2025 that will close this issue
4 tasks
@ptodev ptodev linked a pull request Apr 25, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Development

Successfully merging a pull request may close this issue.