-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[querier] honor querier mint,maxt if no SelectHints are passed to Select #4413
[querier] honor querier mint,maxt if no SelectHints are passed to Select #4413
Conversation
c0acb4f
to
7c4676e
Compare
Thanks for this. What you say seems plausible, but can we go a bit further and remove all assumptions about what the caller meant? |
c1aee58
to
1ce0169
Compare
thank you for taking a look, @bboreham. I dug into this a little bit and saw that the "kludge" comment was first introduced via a prometheus upgrade done via PR: #802 . Looking at the Gopkg dep list, that PR brought in the prometheus commit, prometheus/prometheus@c47fbcb, which was right before the prom 2.3.0 release. The next comment, "later versions of prometheus", was added via PR: to specifically bring in the change where prometheus now submits a select hint with The prometheus version that's pinned against Cortex now does inject a SelectHint.func="series" when submitting a series search. I am unable to find any other instances where prometheus submits a |
3fa3565
to
da6cc2e
Compare
Sorry, I think I have to retract my previous point - looking at prometheus/prometheus#8050 it talks about remote endpoints, so probably we can receive the request and its hints from different versions of Prometheus via remote-read. That said, should the Rules Manager be changed to pass the series hint? |
c30ef07
to
74d0fdb
Compare
Correct me if I'm wrong - Even if cortex can receive a read request and its hints from different versions of Prometheus via remote-read, isn't the entrypoint still gated via the
I don't disagree that the Rules Manager should pass a series hint as well - but in this instance, rather than manipulating the core Rules Manager logic, this PR attempts to match the behavior of the querier implementations in Prometheus itself. The two main Prometheus querier implementations, blockQuerier and blockChunkQuerier, both will set the minT and maxT to the querier itself before utilizing the selectHints, should they exist. Meanwhile, currently there are three Cortex querier implementations (with the fourth being the top-level querier that will fan out to the three queriers), the The blocksStoreQuerier will honor the Prometheus behavior where it will first set the minT,maxT to the querier and utilize the SelectHints, should they exist. The chunkStoreQuerier will only utilize the time range in the SelectHints. Should they not exist (or if the func="series"), the querier will return an empty set. The distributorQuerier is similar to the chunkStoreQuerier in that it will only utilize the time range in the SelectHints. However, should they not exist (or if func="series"), the querier will process the query as a labels search instead of a time series search. What do you think, @bboreham? |
7db9fe2
to
d1f80a3
Compare
Very interesting! I tend to agree with @aallawala here. I think we should match the current Prometheus logic and so when That being said, the place where you changed the logic is not the only place where we check if |
eb6815d
to
b91f37e
Compare
Thank you for taking a look, @pracucci. I went ahead and updated the remaining two base querier implementations, chunksStoreQuerier and distributorQueryable. The blocksStoreQuerier already matches the Prometheus behavior where it will first set the minT,maxT to the querier and utilize the SelectHints, should they exist. Would there happen to be any other places that should be updated? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Note there are a couple of merge conflicts, which I would typically rebase for but you can try doing a merge if you like. If you do it in the GitHub UI it may lack the DCO sign-off.
b91f37e
to
aacbfd4
Compare
Thank you @bboreham for taking another look! I rebased to address the conflicts upstream. PTAL and let me know if there are other changes to be made |
aacbfd4
to
2282c4f
Compare
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
Signed-off-by: Abdurrahman J. Allawala <[email protected]>
2282c4f
to
347ba9a
Compare
Hi, could I get some eyes on this please? |
Thanks! LGTM. |
…ect (cortexproject#4413) * [querier] honor querier mint,maxt if no SelectHints are passed to Select Signed-off-by: Abdurrahman J. Allawala <[email protected]> * export mockDistributor for later use in the ruler tests Signed-off-by: Abdurrahman J. Allawala <[email protected]> * add test for restoring FOR state in the ruler via distributors Signed-off-by: Abdurrahman J. Allawala <[email protected]> * test for always active alert Signed-off-by: Abdurrahman J. Allawala <[email protected]> * move querier/testutils to querier pkg Signed-off-by: Abdurrahman J. Allawala <[email protected]> * add empty chunk store to ruler test Signed-off-by: Abdurrahman J. Allawala <[email protected]> * add changelog entry Signed-off-by: Abdurrahman J. Allawala <[email protected]> * lint Signed-off-by: Abdurrahman J. Allawala <[email protected]> * honor mint,maxt if sp is null for other querier implementations Signed-off-by: Abdurrahman J. Allawala <[email protected]> * missed addition in rebase Signed-off-by: Abdurrahman J. Allawala <[email protected]> Co-authored-by: Alvin Lin <[email protected]> Signed-off-by: Manish Kumar Gupta <[email protected]>
…ect (cortexproject#4413) * [querier] honor querier mint,maxt if no SelectHints are passed to Select Signed-off-by: Abdurrahman J. Allawala <[email protected]> * export mockDistributor for later use in the ruler tests Signed-off-by: Abdurrahman J. Allawala <[email protected]> * add test for restoring FOR state in the ruler via distributors Signed-off-by: Abdurrahman J. Allawala <[email protected]> * test for always active alert Signed-off-by: Abdurrahman J. Allawala <[email protected]> * move querier/testutils to querier pkg Signed-off-by: Abdurrahman J. Allawala <[email protected]> * add empty chunk store to ruler test Signed-off-by: Abdurrahman J. Allawala <[email protected]> * add changelog entry Signed-off-by: Abdurrahman J. Allawala <[email protected]> * lint Signed-off-by: Abdurrahman J. Allawala <[email protected]> * honor mint,maxt if sp is null for other querier implementations Signed-off-by: Abdurrahman J. Allawala <[email protected]> * missed addition in rebase Signed-off-by: Abdurrahman J. Allawala <[email protected]> Co-authored-by: Alvin Lin <[email protected]> Signed-off-by: Alvin Lin <[email protected]>
What this PR does:
This PR attempts to address an issue our org is seeing where each user's Rules Manager is unable to restore the FOR state, as intended via this Prometheus PR prometheus/prometheus#4061
What happens instead is, since the Rules Manager initiates a
Select
query with noSelectHints
(even though the querier that is initialized via https://github.com/prometheus/prometheus/blob/b1ed4a0a663d/rules/manager.go#L710 bounds itself down to just the OutageTolerance window set within the ManagerOpts), the Cortex querier treats this as aseries
search instead of a metrics search. The current Cortex querier implementation looks for a nil SelectHints OR a SelectHints.Func == "series" to apply a Labels search instead of a time series search. This Labels search then strips out any samples via series.MetricsToSeriesSet before sending the result back to the caller.The Ruler's RestoreForState expects a time series search as the samples are later used to restore the FOR state even though a
nil
is passed for SelectHints. Since the querier will return the MetricSeries without samples, the RestoreForState restores to the current timestamp, effectively resetting all progress that was done by the prior owning Ruler instance.I believe this change is alright given that the time bounds of the Select query are checked further down in the querier's Select block. https://github.com/cortexproject/cortex/blob/master/pkg/querier/querier.go#L320-L323
I'm not entirely familiar with this code base, so there will probably be something I'm missing. All feedback is appreciated!
cc'ing the maintainers whom I've seen commits from WRT this, @pracucci @gouthamve @codesome
Which issue(s) this PR fixes:
Fixes #I didn't find a corresponding issue but please do update if there is one.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]