-
Notifications
You must be signed in to change notification settings - Fork 323
Listen for AlertManagerConfig CRDs and configure Alertmanager in Grafana Mimir/Cortex #504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey there! 👋 Yes, that sounds like an interesting idea! Would you like to take a shot at creating an Agent Flow component for it? We can provide guidance around it. |
Hey! I've been planning to 🙂 However, I haven't been able to find the time yet -I'll try to in the coming weeks. Thanks for the proposition to help! I believe that we can do something similar to grafana/agent#2604 |
This would be so useful to us! |
I would like to see an example of that using grafana agent operator CRDs |
@tpaschalis, can you give me some guidance? I have a lot of prometheus rules in my cluster and since I moved to prometheus-agent I lost all my alerting system since mimir ruler does not use rules for nothing |
@rafilkmp3 Apologies I totally missed your initial message. In case you're still interested, you can take some inspiration from grafana/agent#2604; implementing a new |
I just wanted to bump this issue since the feature would be an absolute bomber. I did not miss any other implementations that would solve this request, right? Thanks for the amazing tools you guys develop! |
Another +1 from me! The integration with Mimir Ruler is super useful! Thanks for implementing that 😊 Alertmanager is the final piece of the puzzle! |
+1 for this feature |
Could we help get this over the line please? :) |
Plus one for me here also 🤞 |
+1 How are you managing this right now ? |
+1 Any updates on this? If you need support with the implementation I'd be happy to help if I find the time - just would need some pointers on how to get started ^^ |
I had a look at implementing this, but just like the related proposal states I personally don't know how to merge the AlertmanagerConfig CRDs. For example:
I suppose we'd have to store all active CRDs in memory and the re-merge all of them any time a CRD is added, modified or deleted. That way we can avoid issues with deleted CRDs affecting the config of non-deleted CRDs. I'm still not sure how the merging should actually be done though. I'm quite new to this topic, so I'd be grateful if anyone with more real world experience would like to shed light on it :) |
I think the way to go for Alertmanager would be to crib the prometheus-operator config generation code, then load the resulting complete config into mimir as a single op, similarly to how mimirtool would be used to load an existing alertmanager configuration file. |
Would it be ok if the Alloy component does a few things differently from the operator:
I've been testing a component locally with the following inputs: Alloy configremote.kubernetes.configmap "default" {
namespace = "default"
name = "alertmgr-global"
}
mimir.alerts.kubernetes "default" {
address = "http://mimir-nginx.mimir-test.svc:80"
global_config = remote.kubernetes.configmap.default.data["glbl"]
} Global Alertmanager configapiVersion: v1
kind: ConfigMap
metadata:
name: alertmgr-global
namespace: default
labels:
agent: "yes"
data:
glbl: |
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
enable_http2: true
smtp_hello: localhost
smtp_require_tls: true
route:
receiver: "null"
receivers:
- name: "null"
- name: "alloy-namespace/global-config/myreceiver"
templates: [] AlertmanagerConfig CRD 1apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmgr-config1
namespace: default
labels:
agent: "yes"
spec:
route:
receiver: "null"
routes:
- receiver: myamc
continue: true
receivers:
- name: "null"
- name: myamc
webhookConfigs:
- url: http://test.url
httpConfig:
followRedirects: true AlertmanagerConfig CRD 2apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmgr-config2
namespace: default
labels:
agent: "yes"
spec:
route:
receiver: "null"
routes:
- receiver: 'database-pager'
group_wait: 10s
matchers:
- name: service
value: webapp
receivers:
- name: "null"
- name: "database-pager" When I query Output configtemplate_files: {}
alertmanager_config: |
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
enable_http2: true
smtp_hello: localhost
smtp_require_tls: true
route:
receiver: "null"
continue: false
routes:
- receiver: default/alertmgr-config1/null
matchers:
- namespace="default"
continue: true
routes:
- receiver: default/alertmgr-config1/myamc
continue: true
- receiver: default/alertmgr-config2/null
matchers:
- namespace="default"
continue: true
routes:
- receiver: default/alertmgr-config2/database-pager
match:
service: webapp
continue: false
receivers:
- name: "null"
- name: alloy-namespace/global-config/myreceiver
- name: default/alertmgr-config1/null
- name: default/alertmgr-config1/myamc
webhook_configs:
- send_resolved: false
http_config:
follow_redirects: true
enable_http2: true
url: <secret>
url_file: ""
max_alerts: 0
- name: default/alertmgr-config2/null
- name: default/alertmgr-config2/database-pager
templates: [] When I remove the Output config 2template_files: {}
alertmanager_config: |
global:
resolve_timeout: 5m
http_config:
follow_redirects: true
enable_http2: true
smtp_hello: localhost
smtp_require_tls: true
route:
receiver: "null"
continue: false
routes:
- receiver: default/alertmgr-config2/null
matchers:
- namespace="default"
continue: true
routes:
- receiver: default/alertmgr-config2/database-pager
match:
service: webapp
continue: false
receivers:
- name: "null"
- name: alloy-namespace/global-config/myreceiver
- name: default/alertmgr-config2/null
- name: default/alertmgr-config2/database-pager
templates: [] I need to add a few more tests and I will open a PR to add it in as an experimental component. |
Similarly to grafana/agent#1544, the agent could discover
AlertManagerConfig
CRDs to configure mimir's alert manager via its APIThe text was updated successfully, but these errors were encountered: