-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Source Facebook Marketing: Attempt to retry failing jobs that are already split to minimum size #12390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test connector=connectors/source-facebook-marketing
|
Codecov Report
@@ Coverage Diff @@
## master #12390 +/- ##
=========================================
Coverage ? 90.74%
=========================================
Files ? 11
Lines ? 854
Branches ? 0
=========================================
Hits ? 775
Misses ? 79
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have no other retry logic anywhere this makes sense
...grations/connectors/source-facebook-marketing/source_facebook_marketing/streams/async_job.py
Outdated
Show resolved
Hide resolved
@@ -58,6 +58,9 @@ class Status(str, Enum): | |||
class AsyncJob(ABC): | |||
"""Abstract AsyncJob base class""" | |||
|
|||
# max attempts for a job before errroring out | |||
max_attempts: int = 10 # TODO: verify a sane number for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
am I reading this correctly that we currently have zero retry logic for async jobs, and that we just keep splitting until the job fails at the moment? if so this change makes a lot of sense
I'd recommend setting this to something like 5, it seems like if something fails 9 times it's unlikely to succeed on the 10th? (but you never know when it comes to FB :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is retry logic in async_job_manager.py
but because we try to split_job()
as soon as we hit attempt_number 2, we're ending up throwing this error of lowest split level only after retrying that call once (I think) rather than the specified 20 times.
I'm going to confirm that is what's happening and if so then refactor this PR so that rather than adding new retry logic in the AsyncJob it all ties together properly with the async_job_manager.
...grations/connectors/source-facebook-marketing/source_facebook_marketing/streams/async_job.py
Show resolved
Hide resolved
updated PR description to match new logic |
/test connector=connectors/source-facebook-marketing
|
/publish connector=connectors/source-facebook-marketing
|
…eady split to minimum size (#12390) * restart jobs that are already split to smallest size * manager now fails on nested jobs hitting max attempts * version bump * auto-bump connector version Co-authored-by: Octavia Squidington III <[email protected]>
What
From this oncall issue.
Haven't been able to replicate locally, but I believe that we are prematurely failing the sync if an async job is already split to the smallest size and then it fails. This is because we try to split the job again and raise a RuntimeError if it's the smallest size.
How
.restart()
async jobs that are already the smallest size and fail.async_job_manager
to raise an exception if any nested jobs within aParentAsyncJob
have hit the MAX_ATTEMPTS number.My hope is that the failures the customer is seeing are transient and by doing this we solve the problem. Even if not, I think this is a positive change to allow all jobs the same number of retries.
TODO: Dockerfile and Changelog, waiting on reviews.