-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alertmanager state is not restored from remote storage if replicationFactor == 1 #2245
Labels
Comments
Looking at the change that introduced this, it seems like a bug to me too: 7896c43 |
Actually, looks like it was a combination of the following PRs:
I'll have something that should fix this up for review shortly. |
Good spot - the bug is that we should jump here if RF==1. |
2 tasks
56quarters
added a commit
that referenced
this issue
Jun 30, 2022
In cortexproject/cortex#3925 the ability to restore alertmanager state from peer alertmanagers was added, short-circuiting if there is only a single replica of the alertmanager. In cortexproject/cortex#4021 a fallback to read state from storage was added in case reading from peers failed. However, the short-circuiting if there is only a single peer was not removed. This has the effect of never restoring state in an alertmanager if only running a single replica. Fixes #2245 Signed-off-by: Nick Pillitteri <[email protected]>
56quarters
added a commit
that referenced
this issue
Jul 5, 2022
In cortexproject/cortex#3925 the ability to restore alertmanager state from peer alertmanagers was added, short-circuiting if there is only a single replica of the alertmanager. In cortexproject/cortex#4021 a fallback to read state from storage was added in case reading from peers failed. However, the short-circuiting if there is only a single peer was not removed. This has the effect of never restoring state in an alertmanager if only running a single replica. Fixes #2245 Signed-off-by: Nick Pillitteri <[email protected]>
pracucci
pushed a commit
that referenced
this issue
Jul 6, 2022
* Restore alertmanager state from storage as fallback In cortexproject/cortex#3925 the ability to restore alertmanager state from peer alertmanagers was added, short-circuiting if there is only a single replica of the alertmanager. In cortexproject/cortex#4021 a fallback to read state from storage was added in case reading from peers failed. However, the short-circuiting if there is only a single peer was not removed. This has the effect of never restoring state in an alertmanager if only running a single replica. Fixes #2245 Signed-off-by: Nick Pillitteri <[email protected]> * Code review changes Signed-off-by: Nick Pillitteri <[email protected]>
masonmei
pushed a commit
to udmire/mimir
that referenced
this issue
Jul 11, 2022
* Restore alertmanager state from storage as fallback In cortexproject/cortex#3925 the ability to restore alertmanager state from peer alertmanagers was added, short-circuiting if there is only a single replica of the alertmanager. In cortexproject/cortex#4021 a fallback to read state from storage was added in case reading from peers failed. However, the short-circuiting if there is only a single peer was not removed. This has the effect of never restoring state in an alertmanager if only running a single replica. Fixes grafana#2245 Signed-off-by: Nick Pillitteri <[email protected]> * Code review changes Signed-off-by: Nick Pillitteri <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Alertmanager state is not restored from remote storage if replicationFactor == 1: https://github.com/grafana/mimir/blob/main/pkg/alertmanager/state_replication.go#L205
I understand that running replicationFactor at 1 is a bad idea but for dev or staging environments it usually makes sense to keep things simple or cut down on costs.
I was wondering if this is intentional that we skip reading from storage in these cases?
To Reproduce
Steps to reproduce the behavior:
Expected behavior
State should be read from storage if no replication option is available
Environment
Additional Context
The text was updated successfully, but these errors were encountered: