Skip to content

feat(postgres-cdc): support replicating Postgres schema change #18760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 17, 2024

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented Sep 29, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Copy and patched the Debezium postgres conenctor (v2.6.2) to our project (patched code enclosing with /* patched */)

    • PgOutputMessageDecoder: dispatch schema change event when handling R message
    • Make PostgresSchema act as HistorizedDatabaseSchema which is a prerequisite to emit schema change event
    • Extends the ReplicationStream.ReplicationMessageProcessor to provide interfaces to emit schema change event
  • Modify the jar loading order to ensure risingwave-source-cdc to be the first jar in the classpath so that the jvm can load those patched .class instead of the original classes

  • The schema change event will not be emitted until new data events come in, so if DDL has been executed on upstream but there is no IUD on upstream, the schema change event will not be emitted

  • Add auto.schema.change option to postgres source

Test Coverage:

  • Added a new end-to-end test script auto_schema_change_pg.slt to verify the handling of automatic schema changes in PostgreSQL CDC sources. (e2e_test/source/cdc_inline/auto_schema_change_pg.slt)

Logging Enhancements:

  • Updated logging statements in DbzChangeEventConsumer to include schema and data context in log messages. (java/connector-node/risingwave-connector-service/src/main/java/com/risingwave/connector/source/core/DbzChangeEventConsumer.java) [1] [2]

Configuration Updates:

  • Added a new configuration property include.schema.changes to postgres.properties to control whether schema changes are included. (java/connector-node/risingwave-connector-service/src/main/resources/postgres.properties)

Miscellaneous:

  • Updated .typos.toml to exclude certain Java files from typo checks. (.typos.toml)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

  • Add auto.schema.change option to allow replicating Postgres table schema change

@github-actions github-actions bot added the type/feature Type: New feature. label Sep 29, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 5420 files.

Valid Invalid Ignored Fixed
2306 5 3109 0
Click to see the invalid file list
  • java/connector-node/risingwave-source-cdc/src/main/java/io/debezium/connector/postgresql/PostgresConnectorConfig.java
  • java/connector-node/risingwave-source-cdc/src/main/java/io/debezium/connector/postgresql/PostgresSchema.java
  • java/connector-node/risingwave-source-cdc/src/main/java/io/debezium/connector/postgresql/PostgresStreamingChangeEventSource.java
  • java/connector-node/risingwave-source-cdc/src/main/java/io/debezium/connector/postgresql/connection/ReplicationStream.java
  • java/connector-node/risingwave-source-cdc/src/main/java/io/debezium/connector/postgresql/connection/pgoutput/PgOutputMessageDecoder.java

@@ -88,6 +88,18 @@ impl JavaVmWrapper {
}
}

// move risingwave-source-cdc to the head of classpath, because we have some patched Debezium classes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somehow hacky to do it in this way, because this depends on the behavior of the default class loader of specific jvm implementation. Not sure whether or not all class loaders always take the first seen class in the jar list.

Maybe we can try using the maven shade plugin to include all classes in the original jar and exclude the specific classes patched in this PR, and then we can explicitly exclude the some dbz artifacts in the pom file. The way to use shade plugin can be found in this example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's somehow hacky to do it in this way, because this depends on the behavior of the default class loader of specific jvm implementation. Not sure whether or not all class loaders always take the first seen class in the jar list.

This concern can be eliminated by testing. Since we use same docker image in our ci pipeline and the cloud deployment, the behavior of jvm should be deterministic across our ci and cloud.

Maybe we can try using the maven shade plugin to include all classes in the original jar and exclude the specific classes patched in this PR, and then we can explicitly exclude the some dbz artifacts in the pom file. The way to use shade plugin can be found in this example

I think it is possible shadow original classes with the shade plugin, but we can do it in future pr if it is really needed.

@StrikeW StrikeW marked this pull request as ready for review October 10, 2024 02:53
@graphite-app graphite-app bot requested review from a team October 10, 2024 03:11
@github-actions github-actions bot added the type/feature Type: New feature. label Oct 10, 2024
@StrikeW StrikeW changed the title feat(postgres-cdc): support replicating Postgres schema change (WIP) feat(postgres-cdc): support replicating Postgres schema change Oct 10, 2024
@StrikeW StrikeW requested review from hzxa21 and BugenZhao October 10, 2024 05:28
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM

Copy link

gitguardian bot commented Oct 17, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9425213 Triggered Generic Password e3e14f3 ci/scripts/e2e-source-test.sh View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

@StrikeW StrikeW enabled auto-merge October 17, 2024 06:03
@StrikeW StrikeW added the need-cherry-pick-release-2.1 [⚠️DEPRECATED] Use `..-since-release-..` instead label Oct 17, 2024
@StrikeW StrikeW added this pull request to the merge queue Oct 17, 2024
Merged via the queue into main with commit 340dba6 Oct 17, 2024
34 of 36 checks passed
@StrikeW StrikeW deleted the siyuan/pg-schema-event branch October 17, 2024 07:10
StrikeW added a commit that referenced this pull request Oct 17, 2024
@fuyufjh fuyufjh mentioned this pull request Oct 30, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-cherry-pick-release-2.1 [⚠️DEPRECATED] Use `..-since-release-..` instead type/feature Type: New feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants