Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[RFR for API Sources] New Python interfaces to support resumable full refresh #37429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFR for API Sources] New Python interfaces to support resumable full refresh #37429
Changes from 12 commits
45be2a9
3aacffe
67582c2
f0af212
2d69903
5484d39
45ce52d
5f488ab
2169498
6c2d596
f60a0b7
1b18edb
514fb3f
a39bcc6
1e54edc
cc76250
12f1789
28564e4
94bc5be
8ddae37
52b1a5e
e0e88fb
e976383
0af07eb
8bda880
87fd05e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining that
None
means we stop reading?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant this for the
checkpointReader.next()
, since None in that case stops parsing, but I'll also comment here that we don't emit state messages if return is None either.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've talked about this aspect at length with a few other engineers. I really don't know a better way forward in the immediate right now. Having it be non-obvious that the connector developer be state aware is not a good DX.
The longer term solution would either be a structured state class that handles how to read/write state using clearly defined method instead of a generic map. Or through a cursor class. The problem is it doesn't current exist in the legacy Python CDK
However, we can potentially rationalize that this is okay in the immediate for two reasons:
Basically, even if this interface is not a good paradigm shift in the short term, if we're focusing on concurrent + low-code long term, we can accept this. Happy to be challenged on this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document the expectations on shape of the state object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good point yep I'll find a place to document this either in code or in airbyte docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we call
get_checkpoint()
and the end of the last slice/page and at the end of the sync, we end up emitting the same final state twice. We can potentially insert more fields to track state internally within the reader, but i don't think its worth the hassleThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have an example of a stream that cannot support RFR?
Let's document why / when a connector developer should use
FullRefreshCheckpointReader
instead ofResumableFullRefreshCheckpointReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these checkpoint readers aren't intended to be configured by the developer, I think it would make more sense to document this where I am going to document the state object shape mentioned above. It'll be in a separate airbyte docs page