-
Notifications
You must be signed in to change notification settings - Fork 4.5k
🎉 Postgres Source : Allow streams not in CDC publication to be synced in Full-refresh mode #24622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test connector=connectors/source-postgres
Build PassedTest summary info:
|
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
2.0.17 |
✅ | ✅ |
source-alloydb-strict-encrypt |
2.0.17 |
🔵 (ignored) |
🔵 (ignored) |
source-postgres-strict-encrypt |
2.0.17 |
🔵 (ignored) |
🔵 (ignored) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
@@ -113,6 +112,8 @@ public class PostgresSource extends AbstractJdbcSource<PostgresType> implements | |||
public static final String MODE = "mode"; | |||
|
|||
private List<String> schemas; | |||
|
|||
private Set<AirbyteStreamNameNamespacePair> publicizedTablesInCdc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like the idea of having a global variable and hoping/assuming it will get initialised cause a chain of methods will be executed before we use it. Any change in the execution steps will lead to a potential bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we already have that with schemas()
I think the real issue is that we have a chain of methods where you can't predict what order they'd run in. Ideally, it should be clear what the ordering and so eventually we should gut a lot of the abstract methods deeper in the call stack.
For now, I've moved the initialization of this global variable in createDatabase()
. In general, this is part of the initialization phase and is the first thing that runs in every protocol method. So, any code that refers to this in check/discover/read methods should have this variable initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I still dont like the way we are doing this but I understand that there is no better way! I think the dependency on the database
object for all these operations is a big pain. We need to sort it out
/test connector=connectors/source-postgres
Build PassedTest summary info:
|
/test connector=connectors/source-postgres-strict-encrypt
Build PassedTest summary info:
|
/publish connector=connectors/source-postgres run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres-strict-encrypt run-tests=false |
/publish connector=connectors/source-alloydb run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-alloydb-strict-encrypt run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres-strict-encrypt run-tests=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
Closes #24611
Changes source-postgres to discover streams that are not in the CDC publication and allows to sync those streams in full-refresh mode