Skip to content

🎉 Postgres source: make initial cdc waiting time configurable #14451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 22, 2022

Conversation

tuliren
Copy link
Contributor

@tuliren tuliren commented Jul 6, 2022

What

  • This is one of the PRs to accelerate Postgres CDC initial waiting time.
  • This one simply makes the initial waiting time configurable.
  • It does not change the default 5 minutes.
  • It updates tests to use a much shorter time (10 seconds).
  • Resolve Make CDC debezium first record wait time configurable #12776.

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Jul 6, 2022
@tuliren
Copy link
Contributor Author

tuliren commented Jul 6, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2623445816
❌ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2623445816
🐛 https://gradle.com/s/xxsswwrecix7s

Build Failed

Test summary info:

Could not find result summary

@tuliren
Copy link
Contributor Author

tuliren commented Jul 6, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2626215952
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2626215952
No Python unittests run

Build Passed

Test summary info:

All Passed

@alexandr-shegeda
Copy link
Contributor

@tuliren Since Postgres Source connector is already in beta in addition to integration testing, I need you to perform the following manual test:

  1. Create a connection with the previous version of source-postgres-strict-encrypt connector and run a full refresh (destination does not matter as long as the connection is configured for 'incremental dedupe' mode)
  2. Add some data to the source DB and run an incremental sync
  3. Upgrade source-postgres-strict-encrypt to a version with these changes. Don't change any configuration.
  4. Run an incremental sync
  5. Add some data to the source DB
  6. Downgrade the connector to the previous version
  7. Run an incremental sync
  8. Upgrade source-postgres-strict-encrypt to a version with these changes.
  9. Change initial_waiting_seconds to require
  10. Add some data to the source DB
  11. Run an incremental sync
  12. Downgrade the connector to the previous version
  13. Add some data to the source DB
  14. Run an incremental sync

The expectation for the test is that all syncs succeed

@tuliren
Copy link
Contributor Author

tuliren commented Jul 9, 2022

@alexandr-shegeda, thanks for the instructions. However, I have some questions about these steps:

  1. Change initial_waiting_seconds to require

This step seems unnecessary. This field will not be required. Actually in my opinion, all new fields should always be optional.

  1. Downgrade the connector to the previous version
  2. Downgrade the connector to the previous version

Currently the spec has additionalProperties: false. So downgrading from a version with a new field will always result in schema validation failure. How about I do the following:

  1. Publish a new version with additionalProperties: true (V1).
  2. Publish a new version for changes in this PR (V2).

In this way, if there is anything wrong with this PR, users can still downgrade to a V1 without breaking their connection.

@tuliren tuliren force-pushed the liren/make-postgres-timeout-configurable branch from d217807 to d81a0c3 Compare July 10, 2022 03:19
@tuliren tuliren requested a review from a team as a code owner July 10, 2022 03:19
Copy link
Contributor

@subodh1810 subodh1810 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should build a safety check that the max wait time can be 5 minutes so that if a user mistakenly sets it to more than 5 minutes, we default to 5.
5 minutes has worked in 100% of cases so as of now we dont need it to be more than 5 minutes.

@grishick
Copy link
Contributor

I think we should build a safety check that the max wait time can be 5 minutes so that if a user mistakenly sets it to more than 5 minutes, we default to 5. 5 minutes has worked in 100% of cases so as of now we dont need it to be more than 5 minutes.

Given that we haven't done sufficient testing of this timeout value "in the wild" I would actually argue on the side of allowing users to set it to whatever they want. If we see people shooting themselves in the foot with values that are too large - we'll introduce a safety check based on what those "too large" values end up being.

@subodh1810
Copy link
Contributor

@grishick Am open to increasing the max allowed value to be more than 5 minutes but we should definitely have an upper cap even if its 30 minutes.

@tuliren
Copy link
Contributor Author

tuliren commented Jul 18, 2022

we should definitely have an upper cap even if its 30 minutes.

Currently the max waiting time is 20 minutes in the spec.json. I will enforce the limit in Java as well.

@grishick
Copy link
Contributor

this PR needs a rebase

@tuliren
Copy link
Contributor Author

tuliren commented Jul 20, 2022

/test connector=connectors/source-postgres

🕑 connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2701825901
✅ connectors/source-postgres https://github.com/airbytehq/airbyte/actions/runs/2701825901
No Python unittests run

Build Passed

Test summary info:

All Passed

@tuliren
Copy link
Contributor Author

tuliren commented Jul 20, 2022

/test connector=connectors/source-postgres-strict-encrypt

🕑 connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/2701826335
✅ connectors/source-postgres-strict-encrypt https://github.com/airbytehq/airbyte/actions/runs/2701826335
No Python unittests run

Build Passed

Test summary info:

All Passed

@tuliren
Copy link
Contributor Author

tuliren commented Jul 20, 2022

Will publish and merge after #14574 is published and merged for safety. In that way, if there is anything wrong, users can downgrade without breaking their existing connections.

@lmossman
Copy link
Contributor

@tuliren any update on this change? Once this is out we should be able to speed up the platform build a bit which would be nice

@tuliren
Copy link
Contributor Author

tuliren commented Jul 21, 2022

Oh, the dependent PR has been merged. I will work on merging this PR now.

@tuliren
Copy link
Contributor Author

tuliren commented Jul 21, 2022

/publish connector=connectors/source-postgres

🕑 Publishing the following connectors:
connectors/source-postgres
https://github.com/airbytehq/airbyte/actions/runs/2714954321


Connector Did it publish? Were definitions generated?
connectors/source-postgres

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@tuliren
Copy link
Contributor Author

tuliren commented Jul 21, 2022

/publish connector=connectors/source-postgres-strict-encrypt

🕑 Publishing the following connectors:
connectors/source-postgres-strict-encrypt
https://github.com/airbytehq/airbyte/actions/runs/2714954413


Connector Did it publish? Were definitions generated?
connectors/source-postgres-strict-encrypt

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make CDC debezium first record wait time configurable
6 participants