-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ [source-github] Bump cdk version and enable RFR for all non-incremental streams #42966
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
@@ -57,6 +58,9 @@ def __init__(self, api_url: str = "https://api.github.com", access_token_type: s | |||
self.api_url = api_url | |||
self.state = {} | |||
|
|||
if self.cursor_field or len(self.cursor_field) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused by this check - we use the substream RFR cursor if there is a cursor_field or if its length is 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sorry i should've commented, but it's supposed to be not self.cursor_field
, I reversed it to test something on regression tests, but i'll change it back for final testing when i move it out of draft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the self.supports_incremental
property to be checked?
It could be simply as:
if not self.supports_incremental:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep fair point, will change thanks for the suggestion!
/approve-regression-tests passing tests here: https://github.com/airbytehq/airbyte/actions/runs/10275055124/job/28432993552
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CATs where already failing before this change (see this). Should we fix them on master and introduce the fix here to make sure that the CATs will not be failing with this change?
Also, would it be worth adding a mock server test to validate that the state we expect is the right one?
Fair, but yeah I don't think it's related since nightly builds have been failing due to a missing record for some time now. But I'm okay waiting or fixing. To me it's relatively low risk
Good suggestion I can add that in |
It looks good to me; we still need to fix the |
What is the
|
@brianjlai Nope, this is the same one, I mean we need to fix it first, then merge and continue with this change. |
got it thanks for clarifying. i dug into it and i'm not 100% sure why, but the old |
/approve-regression-tests passing tests here: https://github.com/airbytehq/airbyte/actions/runs/10275055124/job/28432993552 . subsequent changes were just to fix or add tests
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After updating github source on our instance, to 1.8.2 (with this PR) we observed big regression in sync time. Line 26 in 21098fe
|
What
Enables RFR for all streams that are not already supporting incremental syncs.
How
All github streams are effectively substreams because their parent is either a repository or an organization which is based on the stream config. And for these streams we instantiate a
SubstreamResumableFullRefresh
cursor. We skip incrementals by checking for the presence ofcursor_fields
which indicates incremental sync mode.Sample stream state for substreams:
User Impact
They should get RFR w/o needing to update anything
Can this PR be safely reverted and rolled back?