Skip to content

Commit 04b0c1d

Browse files
docs(source-bigquery): Add comprehensive incremental sync documentation (#62476)
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: [email protected] <[email protected]>
1 parent 16e7780 commit 04b0c1d

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed

docs/integrations/sources/bigquery.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,92 @@ The BigQuery data types mapping:
4444
| Change Data Capture | No | |
4545
| SSL Support | Yes | |
4646

47+
## Supported sync modes
48+
49+
The BigQuery source connector supports the following [sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-sync-modes):
50+
51+
- **Full Refresh Sync**: Replaces all data in the destination with data from the source
52+
- **Incremental Sync**: Appends new records based on a cursor field
53+
54+
### Incremental sync behavior
55+
56+
Incremental sync uses a **cursor field** (typically a timestamp or incrementing ID) to track which records have been synced. The connector maintains state between syncs to resume from where the last sync left off.
57+
58+
#### How incremental sync works
59+
60+
The BigQuery source connector implements incremental sync by:
61+
62+
1. **Querying with cursor filter**: Uses `WHERE cursor_field > last_cursor_value` to fetch only new records
63+
2. **State management**: Tracks the maximum cursor value from each sync for resumability
64+
3. **Parameterized queries**: Uses BigQuery's parameterized query API for efficient execution
65+
66+
#### Cursor field requirements
67+
68+
- **Monotonically increasing**: Must be a timestamp, auto-incrementing ID, or other always-increasing field
69+
- **Non-null values**: Records with null cursor values will be skipped
70+
- **Clustering/partitioning recommended**: For optimal query performance, choose a cursor field that aligns with your table's clustering or partitioning strategy
71+
- **Any data type supported**: The connector accepts any BigQuery data type as a cursor field
72+
73+
#### Recommended cursor field types
74+
75+
Based on BigQuery's query performance characteristics:
76+
77+
1. **`TIMESTAMP`** - Best for time-based incremental sync, works well with BigQuery's time-based partitioning
78+
2. **`DATETIME`** - Good alternative to TIMESTAMP for timezone-agnostic scenarios
79+
3. **`DATE`** - Suitable for daily batch incremental sync
80+
4. **`INT64`** - Excellent performance for auto-incrementing IDs
81+
5. **`STRING`** - Supported but slower than numeric/date types for large datasets
82+
83+
#### BigQuery-specific performance considerations
84+
85+
**Partitioned tables**: If your source table is partitioned by date/timestamp, choose a cursor field that aligns with the partition column for optimal performance.
86+
87+
*Note: The SQL examples below illustrate the underlying query patterns that the connector generates. While you select cursor fields through Airbyte's UI, understanding these patterns helps you make informed choices that optimize BigQuery performance and reduce costs.*
88+
89+
```sql
90+
-- Good: cursor field matches partition column
91+
WHERE _PARTITIONTIME > @cursor_value
92+
93+
-- Less optimal: cursor field differs from partition column
94+
WHERE updated_at > @cursor_value AND _PARTITIONTIME >= '2023-01-01'
95+
```
96+
97+
**Clustered tables**: If your table is clustered, using a clustering column as the cursor field can significantly improve query performance.
98+
99+
**Query slots**: Incremental queries consume BigQuery slots. For large tables:
100+
- Use more selective cursor fields when possible
101+
- Consider the frequency of incremental syncs vs. slot usage
102+
- Monitor query performance in BigQuery's query history
103+
104+
#### State management and resumability
105+
106+
- **Automatic state tracking**: The connector automatically saves the maximum cursor value after each successful sync
107+
- **Resume capability**: If a sync fails partway through, the next sync resumes from the last successfully processed cursor value
108+
- **Manual state reset**: You can reset the sync state in Airbyte to re-sync historical data
109+
- **Per-stream state**: Each table/stream maintains independent cursor state
110+
111+
#### Best practices for incremental sync
112+
113+
1. **Choose the right cursor field**:
114+
- Use `updated_at` or `modified_time` for frequently changing data
115+
- Use `created_at` or `insert_time` for append-only data
116+
- Choose fields that align with your table's clustering or partitioning for optimal performance
117+
118+
2. **Optimize for BigQuery performance**:
119+
- Align cursor fields with table partitioning when possible
120+
- Use clustering columns as cursor fields for better performance
121+
- Monitor BigQuery slot usage and query costs
122+
123+
3. **Handle data quality**:
124+
- Ensure cursor field values are always increasing
125+
- Monitor for gaps in cursor field values that might indicate data quality issues
126+
- Consider using `TIMESTAMP` fields over `DATETIME` for better timezone handling
127+
128+
4. **Sync frequency considerations**:
129+
- More frequent syncs reduce data latency but increase BigQuery slot usage
130+
- Balance sync frequency with BigQuery costs and slot availability
131+
- Consider BigQuery's streaming buffer behavior for very recent data
132+
47133
## Getting started
48134

49135
### Requirements

0 commit comments

Comments
 (0)