You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/destinations/postgres.md
+12-13
Original file line number
Diff line number
Diff line change
@@ -2,21 +2,11 @@
2
2
3
3
This page guides you through the process of setting up the Postgres destination connector.
4
4
5
-
:::caution
5
+
## Warning
6
6
7
-
Postgres, while an excellent relational database, is not a data warehouse. Please only consider using postgres as a destination for small data volumes (e.g. less than 10GB) or for testing purposes. For larger data volumes, we recommend using a data warehouse like BigQuery, Snowflake, or Redshift.
7
+
:::warning
8
8
9
-
1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible
10
-
destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or
11
-
updates over ~100GB. Especially when using [typing and deduplication](/using-airbyte/core-concepts/typing-deduping) with `destination-postgres`, be sure to
12
-
monitor your database's memory and CPU usage during your syncs. It is possible for your
13
-
destination to 'lock up', and incur high usage costs with large sync volumes.
14
-
2. When attempting to scale a postgres database to handle larger data volumes, scaling IOPS (disk throughput) is as important as increasing memory and compute capacity.
are likely to cause collisions when used as a destination receiving data from highly-nested and
17
-
flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to
18
-
`{63 byte name}` which causes postgres to throw an error that a duplicate column name was
19
-
specified. This limit is applicable to table names too.
9
+
Postgres, while an excellent relational database, is not a data warehouse. Please only consider using postgres as a destination for small data volumes (e.g. less than 10GB) or for testing purposes. For larger data volumes, we recommend using a data warehouse like BigQuery, Snowflake, or Redshift. Learn more [here](/integrations/destinations/postgres/postgres-troubleshooting#postgres-is-not-a-data-warehouse).
20
10
21
11
:::
22
12
@@ -261,6 +251,15 @@ Now that you have set up the Postgres destination connector, check out the follo
261
251
-[Migrate from mysql to postgres](https://airbyte.com/tutorials/migrate-from-mysql-to-postgresql)
Not all implementations or deployments of a database will be the same. This section lists specific limitations and known issues with the connector based on _how_ or
Postgres, while an excellent relational database, is not a data warehouse. Please only consider using postgres as a destination for small data volumes (e.g. less than 10GB) or for testing purposes. For larger data volumes, we recommend using a data warehouse like BigQuery, Snowflake, or Redshift.
10
+
11
+
:::
12
+
13
+
1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible
14
+
destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or
15
+
updates over ~100GB. Especially when using [typing and deduplication](/using-airbyte/core-concepts/typing-deduping) with `destination-postgres`, be sure to
16
+
monitor your database's memory and CPU usage during your syncs. It is possible for your
17
+
destination to 'lock up', and incur high usage costs with large sync volumes.
18
+
2. When attempting to scale a postgres database to handle larger data volumes, scaling IOPS (disk throughput) is as important as increasing memory and compute capacity.
are likely to cause collisions when used as a destination receiving data from highly-nested and
21
+
flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to
22
+
`{63 byte name}` which causes postgres to throw an error that a duplicate column name was
23
+
specified. This limit is applicable to table names too.
24
+
25
+
### Vendor-Specific Connector Limitations
26
+
27
+
:::warning
28
+
29
+
Not all implementations or deployments of a database will be the same. This section lists specific limitations and known issues with the connector based on _how_ or _where_ it is deployed.
30
+
31
+
:::
32
+
33
+
#### Disk Access
34
+
35
+
The Airbyte Postgres destination relies on sending files to the database's temporary storage to then load in bulk. If your Postgres database does not have access to the `/tmp` file system, data loading will not succeed.
1. Open the [IAM console](https://console.aws.amazon.com/iam/home#home).
50
51
2. In the IAM dashboard, select **Policies**, then click **Create Policy**.
51
52
3. Select the **JSON** tab, then paste the following JSON into the Policy editor (be sure to substitute in your bucket name):
53
+
52
54
```json
53
55
{
54
56
"Version": "2012-10-17",
@@ -83,14 +85,17 @@ At this time, object-level permissions alone are not sufficient to successfully
83
85
#### Authentication Option 1: Using an IAM Role (Most secure)
84
86
85
87
<!-- env:cloud -->
88
+
86
89
:::note
87
90
This authentication method is currently in the testing phase. To enable it for your workspace, please contact our Support Team.
88
91
:::
92
+
89
93
<!-- /env:cloud -->
90
94
91
95
1. In the IAM dashboard, click **Roles**, then **Create role**. <!-- env:oss -->
92
96
2. Choose the appropriate trust entity and attach the policy you created.
93
97
3. Set up a trust relationship for the role. For example for **AWS account** trusted entity use default AWS account on your instance (it will be used to assume role). To use **External ID** set it to environment variables as `export AWS_ASSUME_ROLE_EXTERNAL_ID="{your-external-id}"`. Edit the trust relationship policy to reflect this:
98
+
94
99
```
95
100
{
96
101
"Version": "2012-10-17",
@@ -109,11 +114,14 @@ This authentication method is currently in the testing phase. To enable it for y
109
114
}
110
115
]
111
116
}
112
-
```
117
+
```
118
+
113
119
<!-- /env:oss -->
114
120
<!-- env:cloud -->
121
+
115
122
2. Choose the **AWS account** trusted entity type.
116
123
3. Set up a trust relationship for the role. This allows the Airbyte instance's AWS account to assume this role. You will also need to specify an external ID, which is a secret key that the trusting service (Airbyte) and the trusted role (the role you're creating) both know. This ID is used to prevent the "confused deputy" problem. The External ID should be your Airbyte workspace ID, which can be found in the URL of your workspace page. Edit the trust relationship policy to include the external ID:
124
+
117
125
```
118
126
{
119
127
"Version": "2012-10-17",
@@ -133,11 +141,12 @@ This authentication method is currently in the testing phase. To enable it for y
133
141
]
134
142
}
135
143
```
144
+
136
145
<!-- /env:cloud -->
146
+
137
147
4. Complete the role creation and note the Role ARN.
138
148
5. Select **Attach policies directly**, then find and check the box for your new policy. Click **Next**, then **Add permissions**.
139
149
140
-
141
150
##### Authentication Option 2: Using an IAM User
142
151
143
152
Use an existing or create new
@@ -212,7 +221,7 @@ Use an existing or create new
212
221
on how to create a instanceprofile. _ We recommend creating an Airbyte-specific user. This user
213
222
will require
214
223
[read and write permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_examples_s3_rw-bucket.html)
215
-
to objects in the staging bucket. _ If the Access Key and Secret Access Key are not provided, the
224
+
to objects in the staging bucket. \_ If the Access Key and Secret Access Key are not provided, the
216
225
authentication will rely either on the Role ARN using STS Assume Role or on the instanceprofile.
217
226
5. _ **Secret Access Key** _ Corresponding key to
218
227
the above key id. _ Make sure your S3 bucket is accessible from the machine running Airbyte. _
@@ -237,7 +246,7 @@ Use an existing or create new
237
246
placeholders, as they won't recognized.
238
247
<!-- /env:oss -->
239
248
240
-
5. Click `Set up destination`.
249
+
6. Click `Set up destination`.
241
250
242
251
The full path of the output data with the default S3 Path Format
243
252
`${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_` is:
@@ -494,6 +503,10 @@ In order for everything to work correctly, it is also necessary that the user wh
494
503
}
495
504
```
496
505
506
+
## Limitations & Troubleshooting
507
+
508
+
To see connector limitations, or troubleshoot your S3 connector, see more [in our s3 troubleshooting guide](/integrations/destinations/s3/s3-troubleshooting).
Not all implementations or deployments an "S3-compatible destinations" will be the same. This section lists specific limitations and known issues with the connector based on _how_ or
10
+
_where_ it is deployed.
11
+
12
+
:::
13
+
14
+
#### Linode Object Storage
15
+
16
+
Liniode Object Storage does not properly return etags after setting them, which Airbyte relies on to verify the integrity of the data. This makes this destination currently incompatible with Airbyte.
Copy file name to clipboardExpand all lines: docs/integrations/sources/mongodb-v2.md
+3-39
Original file line number
Diff line number
Diff line change
@@ -172,43 +172,7 @@ When Schema is not enforced there is not way to deselect fields as all fields ar
172
172
173
173
## Limitations & Troubleshooting
174
174
175
-
### MongoDB Oplog and Change Streams
176
-
177
-
[MongoDB's Change Streams](https://www.mongodb.com/docs/manual/changeStreams/) are based on the [Replica Set Oplog](https://www.mongodb.com/docs/manual/core/replica-set-oplog/). This has retention limitations. Syncs that run less frequently than the retention period of the Oplog may encounter issues with missing data.
178
-
179
-
We recommend adjusting the Oplog size for your MongoDB cluster to ensure it holds at least 24 hours of changes. For optimal results, we suggest expanding it to maintain a week's worth of data. To adjust your Oplog size, see the corresponding tutorials for [MongoDB Atlas](https://www.mongodb.com/docs/atlas/cluster-additional-settings/#set-oplog-size) (fully-managed) and [MongoDB shell](https://www.mongodb.com/docs/manual/tutorial/change-oplog-size/) (self-hosted).
180
-
181
-
If you are running into an issue similar to "invalid resume token", it may mean you need to:
182
-
183
-
1. Increase the Oplog retention period.
184
-
2. Increase the Oplog size.
185
-
3. Increase the Airbyte sync frequency.
186
-
187
-
You can run the commands outlined [in this tutorial](https://www.mongodb.com/docs/manual/tutorial/troubleshoot-replica-sets/#check-the-size-of-the-oplog) to verify the current of your Oplog. The expect output is:
188
-
189
-
```yaml
190
-
configured oplog size: 10.10546875MB
191
-
log length start to end: 94400 (26.22hrs)
192
-
oplog first event time: Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
193
-
oplog last event time: Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
194
-
now: Wed Oct 03 2012 15:00:21 GMT-0400 (EDT)
195
-
```
196
-
197
-
When importing a large MongoDB collection for the first time, the import duration might exceed the Oplog retention period. The Oplog is crucial for incremental updates, and an invalid resume token will require the MongoDB collection to be re-imported to ensure no source updates were missed.
198
-
199
-
### Supported MongoDB Clusters
200
-
201
-
- Only supports [replica set](https://www.mongodb.com/docs/manual/replication/) cluster type.
202
-
- TLS/SSL is required by this connector. TLS/SSL is enabled by default for MongoDB Atlas clusters. To enable TSL/SSL connection for a self-hosted MongoDB instance, please refer to [MongoDb Documentation](https://docs.mongodb.com/manual/tutorial/configure-ssl/).
203
-
- Views, capped collections and clustered collections are not supported.
204
-
- Empty collections are excluded from schema discovery.
205
-
- Collections with different data types for the values in the `_id` field among the documents in a collection are not supported. All `_id` values within the collection must be the same data type.
206
-
- Atlas DB cluster are only supported in a dedicated M10 tier and above. Lower tiers may fail during connection setup.
207
-
208
-
### Schema Discovery & Enforcement
209
-
210
-
- Schema discovery uses [sampling](https://www.mongodb.com/docs/manual/reference/operator/aggregation/sample/) of the documents to collect all distinct top-level fields. This value is universally applied to all collections discovered in the target database. The approach is modelled after [MongoDB Compass sampling](https://www.mongodb.com/docs/compass/current/sampling/) and is used for efficiency. By default, 10,000 documents are sampled. This value can be increased up to 100,000 documents to increase the likelihood that all fields will be discovered. However, the trade-off is time, as a higher value will take the process longer to sample the collection.
211
-
- When Running with Schema Enforced set to `false` there is no attempt to discover any schema. See more in [Schema Enforcement](#Schema-Enforcement).
175
+
To see connector limitations, or troubleshoot your MongoDB connector, see more [in our MongoDB troubleshooting guide](/integrations/sources/mongodb-v2/mongodb-v2-troubleshooting).
212
176
213
177
## Configuration Parameters
214
178
@@ -231,8 +195,8 @@ For more information regarding configuration parameters, please see [MongoDb Doc
This version introduces a general availability version of the MongoDB V2 source connector, which leverages
6
-
[Change Data Capture (CDC)](https://docs.airbyte.com/understanding-airbyte/cdc) to improve the performance and
6
+
[Change Data Capture (CDC)](/understanding-airbyte/cdc) to improve the performance and
7
7
reliability of syncs. This version provides better error handling, incremental delivery of data and improved
8
8
reliability of large syncs via frequent checkpointing.
9
9
10
10
**THIS VERSION INCLUDES BREAKING CHANGES FROM PREVIOUS VERSIONS OF THE CONNECTOR!**
11
11
12
12
The changes will require you to reconfigure your existing MongoDB V2 configured source connectors. To review the
13
-
breaking changes and to learn how to upgrade the connector, refer to the [MongoDB V2 source connector documentation](mongodb-v2#upgrade-from-previous-version).
13
+
breaking changes and to learn how to upgrade the connector, refer to the [MongoDB V2 source connector documentation](/integrations/sources/mongodb-v2#upgrade-from-previous-version).
14
14
Additionally, you can manually update existing connections prior to the next scheduled sync to perform the upgrade or
15
15
re-create the source using the new configuration.
16
16
@@ -22,4 +22,4 @@ Worthy of specific mention, this version includes:
22
22
- Sampling of fields for schema discovery
23
23
- Required SSL/TLS connections
24
24
25
-
Learn more about what's new in the connection, view the updated documentation [here](mongodb-v2).
25
+
Learn more about what's new in the connection, view the updated documentation [here](/integrations/sources/mongodb-v2/).
0 commit comments