-
Notifications
You must be signed in to change notification settings - Fork 95
fix: Always save the error #1749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Always save the error #1749
Conversation
The error should always be reported later
Converting this to draft - going to leave some comments on the internal bug we've been talking about |
…416236863-should-report-deadline-exceeded-errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be merged in its current state - we need to figure out why we are receiving a DEADLINE_EXCEEDED
from the test-application to fix these tests - matching on a buggy error message from the test-application server is an anti-pattern because we would be telling the test to accept broken behavior
const expectedMessage = 'Exceeded maximum number of retries retrying error Error: 14 UNAVAILABLE: 14 before any response was received : Previous errors : [{message: 14 UNAVAILABLE: 14, code: 14, details: , note: },{message: 14 UNAVAILABLE: 14, code: 14, details: , note: }]' | ||
assert.strictEqual((err as GoogleError).message, expectedMessage); | ||
// We test below to see that an UNAVAILABLE error was sent back, but not a deadline exceeded error. | ||
// We don't test for an exact match on the message because the DEADLINE_EXCEEDED time can vary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true for timeout based tests, but this is a maximum retries based test - the error message we should be seeing if we weren't having an underlying problem is found here:
gax-nodejs/gax/src/normalCalls/retries.ts
Lines 133 to 138 in bb11949
'Exceeded maximum number of retries ' + | |
(err ? `retrying error ${err} ` : '') + | |
'before any response was received' + | |
errorDetailsSuffix(errorsEncountered), | |
); | |
error.code = Status.DEADLINE_EXCEEDED; |
Please see b/414574369 comment 17 - the DEADLINE_EXCEEDED
we are seeing here, which comes with a "waiting for metadata filters" message - is coming from the server and is a problem we need to solve - we should not be matching on that message - we should be matching on the "Exceeded maximum number of retries" message
// TODO: This test needs to be fixed as createSequenceRequestFactory sends | ||
// back four UNAVAILABLE errors, but only an UNAVAILABLE error and a | ||
// DEADLINE_EXCEEDED error are encountered. The test needs to be corrected | ||
// later on to expect two UNAVAILABLE errors, but we'll keep it as is now | ||
// to unblock the CI pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not comfortable merging this with a broken test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. We can spend more time to make sure the test is correct, but the underlying reason the test produces the wrong result was there before the PR was in place. I decided to write technical debt to get this change through, but we can push through to make everything correct if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are certain kinds of tech debt exceptions I am willing to be more flexible on but unfortunately with this one, we really can't properly test your stuff. I am trying to submit a PR to update showcase to see if that automagically fixes things
/Exceeded maximum number of retries retrying error Error: 14 UNAVAILABLE: 14 before any response was received/, | ||
); | ||
assert.match((err as GoogleError).message, /Exceeded maximum number of retries retrying error Error: 4 DEADLINE_EXCEEDED: Deadline exceeded after/); | ||
assert.match((err as GoogleError).message, /waiting for metadata filters before any response was received : Previous errors : \[{message: 14 UNAVAILABLE: 14, code: 14, details: , note: },{message: 4 DEADLINE_EXCEEDED: Deadline exceeded after/); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below - we should not be matching on this message - this DEADLINE_EXCEEDED
should be coming from
gax-nodejs/gax/src/normalCalls/retries.ts
Lines 133 to 138 in bb11949
'Exceeded maximum number of retries ' + | |
(err ? `retrying error ${err} ` : '') + | |
'before any response was received' + | |
errorDetailsSuffix(errorsEncountered), | |
); | |
error.code = Status.DEADLINE_EXCEEDED; |
Description
Here is why we can remove this code without any negative impact:
Some code was originally in place to ignore DEADLINE_EXCEEDED errors so that the error previous to the DEADLINE_EXCEEDED errors would be reported to users for debugging purposes. However, we actually want to know all of the errors that were encountered during retries for a method call so #1740 was added so that the user would have this information. In particular, we are interested in the very first error that is encountered, not just the last error before the DEADLINE_EXCEEDED errors as this usually gives us a lot of information about the root cause. Also, the last timeout DEADLINE_EXCEEDED error where we want share information about
lastError
actually gets sent back to the user later so the code that is currently here just ignores DEADLINE_EXCEEDED errors that were sent back during retries and not the final timeout DEADLINE_EXCEEDED error.Here is the positive impact of removing this code:
With the code before this PR change, if the original error is a DEADLINE_EXCEEDED error then it will not get added to the error message at https://github.com/googleapis/gax-nodejs/blob/main/gax/src/normalCalls/retries.ts#L119. This makes it impossible to identify root causes in our clients if the original root cause is a DEADLINE_EXCEEDED error because this check is hiding the error from the user.
Impact
This change will ensure that in our handwritten clients that error messages show the DEADLINE_EXCEEDED errors that were encountered which will enable further root cause analysis. See
Testing
section for more details.This will help us solve issues like googleapis/nodejs-datastore#1176.
Testing
Tested on 414574369-test-error-stack-end-to-end-DEADLINE-EXCEEDED using nodejs-datastore.
Without the code change we see:
With the code change we see:
After the code changes the
message
variable contains all the errors in the error stack.Additional Information
Some related PRs:
#1740
#1650