Closed
Description
What area the feature impact?
Connectors
Relevant Information
Goal: Reduce first incremental sync duration
Once the Concurrent CDK supports incremental syncs with state conversion, stripe can now leverage this.
Why only the first incremental sync?
There are a couple of issues with stripe at the moment. For examplehttps://airbytehq-team.slack.com/archives/C02U9R3AF37/p1697571185299049https://airbytehq-team.slack.com/archives/C02U9R3AF37/p1697836660843669Having parent stream but state are not per partition have risk in terms of data loss. We don't expect Concurrent CDK to aggravate this issue but we still fear it might add a layer of complexity on top of the connector
full_refresh and first incremental syncs leverage mostly the same code path (i.e. stripe has this logic of consuming different endpoint if there is a state is empty example)There is not much value to running things concurrent once the first incremental sync is done as it should be up-to-date and hence should only have one slice
There are other issues that were documented as part of this spreadsheet. So instead of doing all streams first incremental sync concurrent, we will work on a stream by stream basis as follow:
- Events
- CreatedCursorIncrementalStripeStream
UpdatedCursorIncrementalStripeStreamPersonsSetupAttempts
For now, we won't tackle:
- Stream using
IncrementalStripeStream
as they are affected by https://github.com/airbytehq/airbyte-internal-issues/issues/2368 which will change the cursor and hence incremental syncs application_fees_refunds
,bank_accounts
andcheckout_sessions_line_items
as their cursor field is not working as is (see https://github.com/airbytehq/airbyte-internal-issues/issues/2435 )refunds
as the implementation will probably change toIncrementalStripeStream
(see https://github.com/airbytehq/airbyte-internal-issues/issues/2450)
[DONE] Missing CDK feature:
The cursor needs to know what is the lowest boundary expected. Let's assume the following partitions are generated:
- start: 0, end: 1
- start: 2: end: 3
If the second is successful but not the first, no state should be emitted. This is different than the logic we implemented as we only take the upper boundary of the first continuous range which in that case would be 3.
Acceptance criteria
- Implement the CDK missing feature described above
- Have incremental syncs be concurrent for streams
- Events
- CreatedCursorIncrementalStripeStream
UpdatedCursorIncrementalStripeStreamPersonsSetupAttempts- We sync the same records as a full_refresh. As we don't have much visibility, manual test a couple of connections for that
- Notify GL of this logic