-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Fix missing attachment stuck actions #5355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing attachment stuck actions #5355
Conversation
Line 336 of
|
@rabbah you might be interested in this bug since you were on the original issue |
Codecov Report
@@ Coverage Diff @@
## master #5355 +/- ##
==========================================
- Coverage 81.46% 76.69% -4.77%
==========================================
Files 240 240
Lines 14407 14407
Branches 618 601 -17
==========================================
- Hits 11736 11050 -686
- Misses 2671 3357 +686
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
This is a critical bug that's gone unsolved for several years. Since attachments are considered secondary files to the main action document, if the attachment is lost for whatever reason (upload timeout, concurrent upload, bad database replication, etc.) the action document is therefore corrupted in a stuck state because the document factory is set up to do a get of the document first when performing a
putEntity
ordeleteEntity
. TheWhiskAction
type overrides theget
operation to include getting the attachment as a part of theget
operation. However once the root document is retrieved, the attachment is then tried to be read and returns a not found making theget
return aDocumentNotFoundException
to theputEntity
ordeleteEntity
. You can therefore never delete or update an action (that is broken since the attachment was lost) once in this state through the openwhisk api and must go through an openwhisk admin to delete the document directly from the artifact db.Worse, for updating an action it incorrectly returns a 409 to the user when a conflict isn't what's actually happening in this case. The
get
of theWhiskAction
correctly returns aDocumentNotFoundException
when the attachment is not found, however theputEntity
function then catches this and tries to put the document without a revision thinking it is a brand new document for the artifact db when the root document does in fact exist so the db returns a 409 when a put request is sent to it without a revision; which is what is then fed back to the user on any update action attempt.I have implemented a simple fix that finally resolves this issue for updating and deleting an action in this missing attachment state without impacting any other behavior of the system outside of the update and delete action api. The change is simple in that we recover in the
get
of theWhiskAction
type if getting the attachment fails for not being found only if an ignore boolean is set. This ignore boolean is only set whenget
is called withinputEntity
andgetEntity
, and only has an effect on theWhiskAction
type.I've reproduced and verified in my environment that this change resolves the issue. Pre change put action returns 409, delete action returns 404, and get action returns 404 if
code=true
otherwise 200. Post change put action succeeds, delete action succeeds, and get action keeps the same expected behavior of 404 ifcode=true
otherwise 200.Related issue and scope
My changes affect the following components
Types of changes
Checklist: