-
Notifications
You must be signed in to change notification settings - Fork 816
QueryFrontend sets incorrect status code on HTTP timeout #2483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't understand how this could could work. It will write the HTTP header but it has already been written above. Am I missing anything @dmitsh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
server.WriteError
callsw.WriteHeader
as well https://github.com/weaveworks/common/blob/8f725fc52d8defd2b0f7eecab55a86f33421c08f/httpgrpc/server/server.go#L163-L175 which I thought overrides the previous header set.But I was wrong:
We can only call
WriteHeader
once.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a chicken and egg situation. We have to set the status before calling
Write
, but theWrite
can fail.With this change,
server.WriteError
will override the status, and we are logging the correct information:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify:
Before this PR, if
io.Copy()
has failed, cortex was returning an empty response (probably because the write was aborted in the middle). But the log was indicating successful status code (200).With my PR, cortex still returns an empty response if
io.Copy()
fails (for the same reason), but we have the correct status code (500) and the error message in the log and metricsThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, just went through the code again. The problem is
io.Copy
will fail ifw.Write
fails. And we shouldn't try to attempt aw.Write
again (which is whatserver.WriteError(w, err)
does internally). I am not sure why the log is printing 200, I would say this is a logging middleware issue and this is not the right fix. But I haven't looked at the logging middleware yet, will do now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/weaveworks/common/blob/master/middleware/logging.go#L40-L42
This is only because we had a bad implementation of the logging middleware. https://github.com/weaveworks/common/blob/master/middleware/response.go#L42-L54 <-- We should ideally capture the error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine the most likely scenario for
Write()
to fail is that the client has disconnected.In that case the context should be cancelled; I tried to detect and log those in weaveworks/common#180 but perhaps ineffectively.
Note it is somewhat unwarranted to count a user action as a Cortex failure - if you're trying to apply an SLI you don't want them lumped in together. We could do like Nginx, and count them as code 499.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An easy way to reproduce this problem is to set
http_server_write_timeout
to a short duration (1 sec?), and then invoke HTTP request. If it takes more than 1 second to get the data, the request will time out, andio.Copy()
will fail.