-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate internal store errors to prometheus API errors. #2571
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pstibrany for working on it! It's a bit tricky this change.
@@ -286,12 +289,12 @@ func (c *baseStore) validateQueryTimeRange(ctx context.Context, userID string, f | |||
defer log.Span.Finish() | |||
|
|||
if *through < *from { | |||
return false, httpgrpc.Errorf(http.StatusBadRequest, "invalid query, through < from (%s < %s)", through, from) | |||
return false, QueryError(fmt.Sprintf("invalid query, through < from (%s < %s)", through, from)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using httpgrpc.Errorf()
in validateQueryTimeRange()
changes the behaviour for LabelValuesForMetricName()
and LabelNamesForMetricName()
where we want to return an httpgrpc
error. An option could be always wrapping the error there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LabelValuesForMetricName
and LabelNamesForMetricName
are not even used in Cortex :) (but they are in Loki).
I think Loki should recognize QueryError and convert it to appropriate error in the layer it is using. Store itself should not return gRPC errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make sure that it is handled correctly in Loki, if this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with a nit)
@@ -145,6 +145,14 @@ func runQueryFrontendTest(t *testing.T, setup queryFrontendSetup) { | |||
c, err := e2ecortex.NewClient("", queryFrontend.HTTPEndpoint(), "", "", fmt.Sprintf("user-%d", userID)) | |||
require.NoError(t, err) | |||
|
|||
// No need to repeat this test for each user. | |||
if userID == 0 && testMissingMetricName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[thinking loudly] I'm a bit dubious about adding this further test here. The query-frontend integration test is become the place where everyone adds further cases and we risk to make it just more and more complicated. This case is already covered by the assertion you added to TestQuerierWithChunksStorage
, so do we really need this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally I suspected that status code is mangled by query-frontend and wanted to make sure that it is propagated properly. Repeating some of the tests between querier and query-frontend makes sense imho, and we should perhaps refactor the code to avoid duplicated code, but repeat those tests.
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
If users will get different error codes, doesn't this need a CHANGELOG entry? |
Yes. I've sent #2773 to fix that. |
This PR modifies how errors are reported from chunk store. Instead of using grpc errors with status codes, this PR introduces new QueryError type, which is then recognized by
chunkStoreQuerier
and translated to the appropriate error returned to promQL engine and API wrapper.Prometheus API wrapper doesn't recognize grpc errors with status codes, but it does recognize some errors and adjusts status code based on them. Result of this PR is that
QueryError
s are translated to 422 (best that Prometheus has to offer).Previously, our
httpgrpc.Errorf(http.StatusBadRequest, ...)
were actually translated to status code 500, despite the attempt to useStatusBadRequest
.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]