Skip to content

DBZ iterator migration to use SourceStateIterator #36333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 26, 2024

Conversation

xiaohansong
Copy link
Contributor

@xiaohansong xiaohansong commented Mar 20, 2024

Move DBZ iterator to use SourceStateIterator

Apply this on postgres

Copy link

vercel bot commented Mar 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 26, 2024 4:29am

@octavia-squidington-iii octavia-squidington-iii added the CDK Connector Development Kit label Mar 20, 2024
@xiaohansong xiaohansong requested a review from akashkulk March 20, 2024 22:36
@xiaohansong xiaohansong marked this pull request as ready for review March 20, 2024 23:55
@xiaohansong xiaohansong requested a review from a team as a code owner March 20, 2024 23:55
@@ -14,7 +14,7 @@ java {
airbyteJavaConnector {
cdkVersionRequired = '0.23.19'
features = ['db-sources', 'datastore-postgres']
useLocalCdk = false
useLocalCdk = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to properly validate this change, it needs to be applied to at least one connector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - this change exactly serves this before releasing cdk - so all tests on postgres would pass with this change. once approved i'll need to release cdk and bump up the required version here too.

}
if (offsetManager == null) {
throw new RuntimeException("Offset can not be null");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this check to the constructor.

// include all necessary data such as snapshot completion.
// This is the case for MS SQL Server, at least.
return createStateMessage(initialOffset);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to get rid of this check entirely at this point. Try it out and see if source-mssql tests pass

cc @akashkulk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will break some test - it's minor though, turns out state in global state will have some minor changes:

before:

    "mssql_cdc_offset": {
      "[\"db_ziwfjgytty\",{\"server\":\"db_ziwfjgytty\",\"database\":\"db_ziwfjgytty\"}]": "{\"commit_lsn\":\"00000029:00000298:0003\",\"snapshot\":true,\"snapshot_completed\":true}"
    },

after:

    "mssql_cdc_offset": {
      "[\"db_ziwfjgytty\",{\"server\":\"db_ziwfjgytty\",\"database\":\"db_ziwfjgytty\"}]": "{\"transaction_id\":null,\"event_serial_no\":0,\"commit_lsn\":\"00000029:00000298:0003\",\"change_lsn\":\"NULL\"}"
    },

I'll revert this change for now

previousCheckpointOffset.clear();
previousCheckpointOffset.putAll(checkpointOffsetToSend);
shouldEmitStateMessage = false;
return stateMessage;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we call resetCheckpointValues() i.e. call checkpointOffsetToSend.clear(); before returning a state message?

Also we can probably replace resetCheckpointValues() with checkpointOffsetToSend.clear()`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a great catch, thanks! that actually solved test issue I've had since I turned up the mssql. Surprisingly tests on postgres did not capture that

this.schemaHistoryManager = schemaHistoryManager;
this.previousCheckpointOffset = (HashMap<String, String>) offsetManager.read();
this.initialOffset = new HashMap<>(this.previousCheckpointOffset);
resetCheckpointValues();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : not sure if there is a need for this? I understand you're porting over code from before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed. yeah initial value has been empty. Previously it also reset other stuff but I've deleted most of them so it's not applicable here.

Copy link
Contributor

@akashkulk akashkulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small nits

@xiaohansong xiaohansong requested a review from a team as a code owner March 25, 2024 23:59
@xiaohansong
Copy link
Contributor Author

xiaohansong commented Mar 26, 2024

/publish-java-cdk

🕑 https://github.com/airbytehq/airbyte/actions/runs/8430583675
✅ Successfully published Java CDK version=0.27.4!

@xiaohansong
Copy link
Contributor Author

also manually tested on postgres and mssql on cdc incremental sync.

@xiaohansong xiaohansong merged commit 1b99f1a into master Mar 26, 2024
30 checks passed
@xiaohansong xiaohansong deleted the xiaohan/dbz-iterator branch March 26, 2024 04:50
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
stephane-airbyte added a commit that referenced this pull request Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation CDK Connector Development Kit connectors/source/mssql connectors/source/postgres
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants