Skip to content

🎉 Postgres Source : Allow streams not in CDC publication to be synced in Full-refresh mode #24622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 5, 2023

Conversation

akashkulk
Copy link
Contributor

Closes #24611

Changes source-postgres to discover streams that are not in the CDC publication and allows to sync those streams in full-refresh mode

@akashkulk
Copy link
Contributor Author

akashkulk commented Mar 27, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4537531825
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4537531825
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:578: The previous and actual discovered catalogs are identical.
=================== 68 passed, 5 skipped in 80.01s (0:01:20) ===================

@akashkulk akashkulk requested a review from subodh1810 March 27, 2023 23:21
@github-actions
Copy link
Contributor

github-actions bot commented Mar 27, 2023

Affected Connector Report

NOTE ⚠️ Changes in this PR affect the following connectors. Make sure to do the following as needed:

  • Run integration tests
  • Bump connector or module version
  • Add changelog
  • Publish the new version

✅ Sources (3)

Connector Version Changelog Publish
source-alloydb 2.0.17
source-alloydb-strict-encrypt 2.0.17 🔵
(ignored)
🔵
(ignored)
source-postgres-strict-encrypt 2.0.17 🔵
(ignored)
🔵
(ignored)
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Destinations (0)

Connector Version Changelog Publish
  • See "Actionable Items" below for how to resolve warnings and errors.

✅ Other Modules (0)

Actionable Items

(click to expand)

Category Status Actionable Item
Version
mismatch
The version of the connector is different from its normal variant. Please bump the version of the connector.

doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.
Changelog
doc not found
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug.

changelog missing
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog.
Publish
not in seed
The connector is not in the seed file (e.g. source_definitions.yaml), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug.

diff seed version
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version.

@akashkulk akashkulk marked this pull request as ready for review March 28, 2023 00:05
@akashkulk akashkulk requested a review from a team as a code owner March 28, 2023 00:05
@@ -113,6 +112,8 @@ public class PostgresSource extends AbstractJdbcSource<PostgresType> implements
public static final String MODE = "mode";

private List<String> schemas;

private Set<AirbyteStreamNameNamespacePair> publicizedTablesInCdc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like the idea of having a global variable and hoping/assuming it will get initialised cause a chain of methods will be executed before we use it. Any change in the execution steps will lead to a potential bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, we already have that with schemas()

I think the real issue is that we have a chain of methods where you can't predict what order they'd run in. Ideally, it should be clear what the ordering and so eventually we should gut a lot of the abstract methods deeper in the call stack.

For now, I've moved the initialization of this global variable in createDatabase(). In general, this is part of the initialization phase and is the first thing that runs in every protocol method. So, any code that refers to this in check/discover/read methods should have this variable initialized

@akashkulk akashkulk requested a review from subodh1810 March 30, 2023 05:43
Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I still dont like the way we are doing this but I understand that there is no better way! I think the dependency on the database object for all these operations is a big pain. We need to sort it out

@akashkulk
Copy link
Contributor Author

akashkulk commented Apr 5, 2023

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4622949399
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/4622949399
No Python unittests run

Build Passed

Test summary info:

=========================== short test summary info ============================
SKIPPED [1] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/plugin.py:63: Skipping TestIncremental.test_two_sequential_reads: not found in the config.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:100: The previous and actual specifications are identical.
SKIPPED [2] ../usr/local/lib/python3.9/site-packages/connector_acceptance_test/tests/test_core.py:578: The previous and actual discovered catalogs are identical.
=================== 68 passed, 5 skipped in 82.69s (0:01:22) ===================

@akashkulk
Copy link
Contributor Author

akashkulk commented Apr 5, 2023

/test connector=connectors/source-postgres-strict-encrypt

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/4622950646
✅ connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/4622950646
No Python unittests run

Build Passed

Test summary info:

All Passed

@akashkulk
Copy link
Contributor Author

akashkulk commented Apr 5, 2023

/publish connector=connectors/source-postgres run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/4623151204


Connector Did it publish? Were definitions generated?
connectors/source-postgres

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

@akashkulk
Copy link
Contributor Author

akashkulk commented Apr 5, 2023

/publish connector=connectors/source-alloydb run-tests=false

🕑 Publishing the following connectors:
connectors/source-alloydb
https://github.com/airbytehq/airbyte/actions/runs/4623155230


Connector Did it publish? Were definitions generated?
connectors/source-alloydb

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

akashkulk commented Apr 5, 2023

/publish connector=connectors/source-alloydb-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-alloydb-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4623156543


Connector Did it publish? Were definitions generated?
connectors/source-alloydb-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk
Copy link
Contributor Author

akashkulk commented Apr 5, 2023

/publish connector=connectors/source-postgres-strict-encrypt run-tests=false

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/4623399072


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@akashkulk akashkulk enabled auto-merge (squash) April 5, 2023 22:46
@akashkulk akashkulk merged commit e96f9e0 into master Apr 5, 2023
@akashkulk akashkulk deleted the postgres-discover-publication branch April 5, 2023 23:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres should not restrict discovered tables to those present in publication
3 participants