Skip to content

Commit 06128ab

Browse files
marcosmarxmMarcos Marx
andauthored
Doc explains normalization full-refresh implications (#6097)
* update docs * add info in quickstart connection page * update abhi comments Co-authored-by: Marcos Marx <[email protected]>
1 parent 8b40e13 commit 06128ab

File tree

2 files changed

+18
-0
lines changed

2 files changed

+18
-0
lines changed

docs/faq/data-loading.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,17 @@ It can take a while for Airbyte to load data into your destination. Some sources
66
data we can sync in a given time. Large amounts of data in your source can also make the initial sync take longer. You can check your
77
sync status in your connection detail page that you can access through the destination detail page or the source one.
88

9+
## **Why my final tables are being recreated everytime?**
10+
11+
Airbyte ingests data into raw tables and applies the process of normalization if you selected it in the connection page.
12+
The normalization runs a full refresh each sync and for some destinations like Snowflake, Redshift, Bigquery this may incur more
13+
resource consumption and more costs. You need to pay attention to the frequency that you're retrieving your data to avoid issues.
14+
For example, if you create a connection to sync every 5 minutes with incremental sync on, it will only retrieve new records into the raw tables but will apply normalization
15+
to *all* the data in every sync! If you have tons of data, this may not be the right sync frequency for you.
16+
17+
There is a [Github issue](https://github.com/airbytehq/airbyte/issues/4286) to implement normalization using incremental, which will reduce
18+
costs and resources in your destination.
19+
920
## **What happens if a sync fails?**
1021

1122
You won't lose data when a sync fails, however, no data will be added or updated in your destination.

docs/quickstart/set-up-a-connection.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,10 @@ This is just the beginning of using Airbyte. We support a large collection of so
4242
If you have any questions at all, please reach out to us on [Slack](https://slack.airbyte.io/). We’re still in alpha, so if you see any rough edges or want to request a connector you need, please create an issue on our [Github](https://github.com/airbytehq/airbyte) or leave a thumbs up on an existing issue.
4343

4444
Thank you and we hope you enjoy using Airbyte.
45+
46+
47+
{% hint style="warning" %}
48+
At the moment, Airbyte runs a full-refresh to recreate the final tables. This can cause more costs in some destinations like Snowflake, Redshidt, and Bigquery.
49+
To understand better what sync mode and frequency you should select, read [this doc](../understanding-airbyte/connections/README.md).
50+
There is a FAQ section that more extensively explains the cost issue [here](../faq/data-loading.md#why-my-final-tables-are-being-recreated-everytime).
51+
{% endhint %}

0 commit comments

Comments
 (0)