-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Adding SyncMethod "Full Refresh - Deduped + history" #3090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I had this problem on a project before. I'll take a look to measure the effort and if is possible. @Fredehagelund92 are you willing to contribute? (no pressure 😬!) I can help you set up env / checking what needs to change if you like |
Hi @marcosmarxm sure i’d like to get more familiar with the codebase, so if you can point me in the right direction then i can start start contributing 👍 |
@Fredehagelund92 you need to start from
|
Thumbs up on this. Let me give a use case I have come across for further motivation/context.
The deleted step is a difficult question for me: should it be the source's job to send through information that a row is deleted? Or the destination's job to compare the current state to the state the source sends? The latter probably makes most sense but it requires that the destination see all primary keys via a source sync mode full refresh in order to do an ID comparison after all records are seen in order to know which to delete. That's quite different to the other destination sync modes which can do what they need to do without depending on or knowing about the source sync mode. One way to do this may be to write all rows to a temporary table to prevent storing all the IDs in memory, do the comparison, and make all the necessary updates. On the other hand if you don't need/want delete capability then you can still do record-by-record consumption. Another note is that there is no need for another field |
@evantahler are this still relevant with Destination v2 plans? |
Closing this issue as I don't think that it really fits with the changes we are making to normalization. The SCD (history) tables are going away. |
Tell us about the problem you're trying to solve
Currently it is only possible to do the following sync methods:
This might be sufficient for most cases, but in some cases you might not be able to do an incremental sync. Lets say the source does not have and
updated_at
or auto_incrementid
. Then it will be hard to make an incremental sync. Of course you could just do aFull Refresh
, but then you won't be able to get history. History can be important since it provides a timeline for when changes has happened in source systems that has not this included.Describe the solution you’d like
I think we need to create a sync method called
Full Refresh - Deduped + history
. This can easily be done by comparinghash
between rows. Since its a Full refresh it also enable us to track hard deletes. For instance a new column calledairbyte_deleted_row
.I might update and be more specific after i'm more familiar with the codebase.
Describe the alternative you’ve considered or used
The alternative is just to handle this using
dbt
on theFull Refresh - Overwrite
and handle this.Additional context
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: