Skip to content

Commit 8913b22

Browse files
authored
Remove schema evolution from CDC docs and other tweaks (#37731)
1 parent 6e6cc5c commit 8913b22

File tree

1 file changed

+7
-6
lines changed
  • docs/understanding-airbyte

1 file changed

+7
-6
lines changed

docs/understanding-airbyte/cdc.md

+7-6
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,24 @@ Many common databases support writing all record changes to log files for the pu
66

77
## Syncing
88

9-
The orchestration for syncing is similar to non-CDC database sources. After selecting a sync interval, syncs are launched regularly. We read data from the log up to the time that the sync was started. We do not treat CDC sources as infinite streaming sources. You should ensure that your schedule for running these syncs is frequent enough to consume the logs that are generated. The first time the sync is run, a snapshot of the current state of the data will be taken. This is done using `SELECT` statements and is effectively a Full Refresh. Subsequent syncs will use the logs to determine which changes took place since the last sync and update those. Airbyte keeps track of the current log position between syncs.
9+
The orchestration for syncing is similar to non-CDC database sources. After selecting a sync interval, syncs are launched regularly. We read data from the previously synced position in the logs up to the start time of the sync. We do not treat CDC sources as infinite streaming sources. You should ensure that your schedule for running these syncs is frequent enough to consume the logs that are generated. The first time the sync is run, a snapshot of the current state of the data will be taken. This snapshot is created with a `SELECT` statement and is effectively a Full Refresh (meaning changes won't be logged). Subsequent syncs will use the logs to determine which changes took place since the last sync and update those. Airbyte keeps track of the current log position between syncs.
1010

11-
A single sync might have some tables configured for Full Refresh replication and others for Incremental. If CDC is configured at the source level, all tables with Incremental selected will use CDC. All Full Refresh tables will replicate using the same process as non-CDC sources. However, these tables will still include CDC metadata columns by default.
11+
A single sync might have some tables configured for Full Refresh replication and others for Incremental. If CDC is configured at the source level, all tables with Incremental selected will use CDC. All Full Refresh tables will replicate using the same process as non-CDC sources.
1212

1313
The Airbyte Protocol outputs records from sources. Records from `UPDATE` statements appear the same way as records from `INSERT` statements. We support different options for how to sync this data into destinations using primary keys, so you can choose to append this data, delete in place, etc.
1414

15-
We add some metadata columns for CDC sources:
15+
We add some metadata columns for CDC sources which all begin with the `_ab_cdc_` prefix. The actual columns syced will vary per srouce, but might look like:
1616

17-
* `_ab_cdc_lsn` \(postgres and sql server sources\) is the point in the log where the record was retrieved
17+
* `_ab_cdc_lsn` of `_ab_cdc_cursor` the point in the log where the record was retrieved
1818
* `_ab_cdc_log_file` & `_ab_cdc_log_pos` \(specific to mysql source\) is the file name and position in the file where the record was retrieved
1919
* `_ab_cdc_updated_at` is the timestamp for the database transaction that resulted in this record change and is present for records from `DELETE`/`INSERT`/`UPDATE` statements
2020
* `_ab_cdc_deleted_at` is the timestamp for the database transaction that resulted in this record change and is only present for records from `DELETE` statements
2121

2222
## Limitations
2323

24-
* CDC incremental is only supported for tables with primary keys. A CDC source can still choose to replicate tables without primary keys as Full Refresh or a non-CDC source can be configured for the same database to replicate the tables without primary keys using standard incremental replication.
24+
* CDC incremental is only supported for tables with primary keys for most sources. A CDC source can still choose to replicate tables without primary keys as Full Refresh or a non-CDC source can be configured for the same database to replicate the tables without primary keys using standard incremental replication.
2525
* Data must be in tables, not views.
2626
* The modifications you are trying to capture must be made using `DELETE`/`INSERT`/`UPDATE`. For example, changes made from `TRUNCATE`/`ALTER` won't appear in logs and therefore in your destination.
27-
* We do not support schema changes automatically for CDC sources. We recommend resetting and resyncing data if you make a schema change.
2827
* There are database-specific limitations. See the documentation pages for individual connectors for more information.
2928
* The records produced by `DELETE` statements only contain primary keys. All other data fields are unset.
3029

@@ -34,6 +33,8 @@ We add some metadata columns for CDC sources:
3433
* [MySQL](../integrations/sources/mysql.md)
3534
* [Microsoft SQL Server / MSSQL](../integrations/sources/mssql.md)
3635
* [MongoDB](../integrations/sources/mongodb-v2.md)
36+
37+
3738
## Coming Soon
3839

3940
* Oracle DB

0 commit comments

Comments
 (0)