Skip to content

Add alertmanager alerts.yaml #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 3, 2024
Merged

Add alertmanager alerts.yaml #47

merged 1 commit into from
Dec 3, 2024

Conversation

hollanbm
Copy link
Contributor

@hollanbm hollanbm commented Nov 27, 2024

Big fan of this project, and your other exporters.

I wanted to give back, so I'm sharing one of the rules I have setup for alertmanager as an example.

Thanks!

@pchang388
Copy link
Contributor

hey @hollanbm, thanks for very for your contribution!

Let me take a look and see then push it on through

@pchang388
Copy link
Contributor

pchang388 commented Dec 3, 2024

There are some cases where Tdarr may fail/error the transcode and it is still acceptable (i.e. plugin doesn't work with file type well, etc.). In those cases the error number will always be non zero (since its a running count from Tdarr). In this case, it may be better to use a rate + sum function in the query and reduce the threshold time. This would allow for those with that use case to be fine and still be alerted if there's an increase in error transcodes.

Something like this as the query I'm thinking:

sum(rate(tdarr_library_transcodes{status="error"}[5m])) by (tdarr_instance) > 0

Although this isn't perfect either, I'm thinking it may auto resolve (depending on config) after the error period (in the rate) has passed but at least the alert is sent out for the user to be aware

@hollanbm
Copy link
Contributor Author

hollanbm commented Dec 3, 2024

There are some cases where Tdarr may fail/error the transcode and it is still acceptable (i.e. plugin doesn't work with file type well, etc.). In those cases the error number will always be non zero (since its a running count from Tdarr). In this case, it may be better to use a rate + sum function in the query and reduce the threshold time. This would allow for those with that use case to be fine and still be alerted if there's an increase in error transcodes.

Something like this as the query I'm thinking:


sum(rate(tdarr_library_transcodes{status="error"}[5m])) by (tdarr_instance) > 0

Although this isn't perfect either, I'm thinking it may auto resolve (depending on config) after the error period (in the rate) has passed but at least the alert is sent out for the user to be aware

Since this is just documentation and examples, we could add both, and an explanation :)

@pchang388
Copy link
Contributor

@hollanbm you're right definitely can do that - was just thinking out loud hah. Thank you again for the contribution! Pushing this through

@pchang388 pchang388 merged commit 8b7e35e into homeylab:main Dec 3, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants