Fail to sync between Postgres DB in increamental-deduped-history mode #7132

lorriexingfang86 · 2021-10-18T18:26:01Z

Enviroment

Airbyte version: 0.29.19-alpha
OS Version / Instance: macOS
Deployment: Docker Desktop
Source Connector and version: Postgres DB on AWS RDS
Destination Connector and version: Postgres DB on AWS RDS
Severity: High
Step where error happened: Sync job

Current Behavior

I want to sync up about 70GB data from source (Postgres DB A) to destination (Postgres DB B).

I configured the sync mode as

increamental-deduced-history
Because I want the destination postgres DB to get updated for every add/modify/delete on source when a syncing happens.

However, it failed after syncing about 300k records when a postgres IO error occurs -> retied 2 more times, but still failed to sync all the data. Here are the logs:

1st-sync-logs-attempt-1.txt
1st-sync-logs-attempt-2.txt
1st-sync-logs-attempt-3.txt

Then I manually triggered the sync again, but still got err:

2nd-sync-10-17-airbyte-logs-attempt-1.txt

Seems I will always fail to sync now.

Expected Behavior

It should finish syncing and I should be able to see all source records in destination.

Logs

Please see complete log from attachment: logs-19-0.txt
logs-19-0 (1).txt

Follow up questions

What does table_name_scd mean? I saw 3 tables (source db name, temp db name, and db_name_scd) in destination for a table I want to map.

Thank you very much!

The text was updated successfully, but these errors were encountered:

grishick · 2022-07-05T18:33:19Z

Looks like the source database is dropping the connection after 1.5-2 hours. A solution could be what we've discussed earlier today where source connector checkpoints after every batch and gracefully handles a connection drop allowing the job to finish.

subodh1810 · 2022-07-05T18:42:15Z

We could do this by building our select * query in such a way that it sorts the table based on the primary key or the cursor field and then divide it in batches and emit a state message after each batch so that if the sync fails, the next sync starts from where previous one left

grishick · 2022-07-05T20:58:55Z

our JDBC connector already fetches records in batches according to AdaptiveStreamingQueryConfig.currentFetchSize - would it make sense to checkpoint after the connector fetches currentFetchSize records? Not sure where we would do that, but seems like it could be in tryAdvance

tuliren · 2022-07-19T17:42:03Z

The fetchSize in AdaptiveStreamingQueryConfig is an implementation detail in JDBC. It does not change the total number of records to read. It is the number of records JDBC retrieves for each read in order to retrieve all the records. So it is irrelevant to this issue.

grishick · 2022-07-19T17:53:16Z

Hey team! Please add your planning poker estimate with ZenHub @subodh1810 @tuliren @edgao @ryankfu

grishick · 2022-07-19T17:53:59Z

The proposed fix is to emit a state message frequently, possibly as part of this code in AbstractDbSource:
if (count % 10000 == 0) { LOGGER.info("Reading stream {}. Records read: {}", streamName, count); }

lorriexingfang86 added the type/bug Something isn't working label Oct 18, 2021

octavia-squidington-iii added the community label Oct 18, 2021

harshithmullapudi added the area/connectors Connector related issues label Oct 19, 2021

igrankova added connectors/sources-database connectors/source/postgres labels Jan 13, 2022

bleonard added autoteam team/tse Technical Support Engineers labels Apr 26, 2022

grishick added team/databases and removed team/tse Technical Support Engineers labels May 10, 2022

grishick changed the title ~~Fail to sync between Postgres DB in increamental-deduced-history mode~~ Fail to sync between Postgres DB in increamental-deduped-history mode Jul 19, 2022

grishick assigned tuliren Jul 20, 2022

tuliren mentioned this issue Jul 20, 2022

🎉 Postgres source: emit state messages more frequently for incremental sync #14903

Merged

tuliren closed this as completed in #14903 Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to sync between Postgres DB in increamental-deduped-history mode #7132

Fail to sync between Postgres DB in increamental-deduped-history mode #7132

lorriexingfang86 commented Oct 18, 2021 •

edited

Loading

grishick commented Jul 5, 2022

subodh1810 commented Jul 5, 2022

grishick commented Jul 5, 2022

tuliren commented Jul 19, 2022

grishick commented Jul 19, 2022

grishick commented Jul 19, 2022

Fail to sync between Postgres DB in increamental-deduped-history mode #7132

Fail to sync between Postgres DB in increamental-deduped-history mode #7132

Comments

lorriexingfang86 commented Oct 18, 2021 • edited Loading

Enviroment

Current Behavior

Expected Behavior

Logs

Follow up questions

grishick commented Jul 5, 2022

subodh1810 commented Jul 5, 2022

grishick commented Jul 5, 2022

tuliren commented Jul 19, 2022

grishick commented Jul 19, 2022

grishick commented Jul 19, 2022

lorriexingfang86 commented Oct 18, 2021 •

edited

Loading