Skip to content

Commit 4b1ebb7

Browse files
authored
[docs] update pg destination warnings (#36454)
1 parent daf62e1 commit 4b1ebb7

File tree

1 file changed

+84
-47
lines changed

1 file changed

+84
-47
lines changed

docs/integrations/destinations/postgres.md

+84-47
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,18 @@ This page guides you through the process of setting up the Postgres destination
44

55
:::caution
66

7-
Postgres, while an excellent relational database, is not a data warehouse.
8-
9-
1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or updates over ~500GB. Especially when using normalization with `destination-postgres`, be sure to monitor your database's memory and CPU usage during your syncs. It is possible for your destination to 'lock up', and incur high usage costs with large sync volumes.
10-
2. Postgres column size limitations are likley to cause colisions when used as a destination reciving data from highly-nested and flattened sources.
7+
Postgres, while an excellent relational database, is not a data warehouse.
8+
9+
1. Postgres is likely to perform poorly with large data volumes. Even postgres-compatible
10+
destinations (e.g. AWS Aurora) are not immune to slowdowns when dealing with large writes or
11+
updates over ~500GB. Especially when using normalization with `destination-postgres`, be sure to
12+
monitor your database's memory and CPU usage during your syncs. It is possible for your
13+
destination to 'lock up', and incur high usage costs with large sync volumes.
14+
2. Postgres column [name length limitations](https://www.postgresql.org/docs/current/limits.html)
15+
are likely to cause collisions when used as a destination receiving data from highly-nested and
16+
flattened sources, e.g. `{63 byte name}_a` and `{63 byte name}_b` will both be truncated to
17+
`{63 byte name}` which causes postgres to throw an error that a duplicate column name was
18+
specified.
1119

1220
:::
1321

@@ -23,11 +31,15 @@ used by default. Other than that, you can proceed with the open-source instructi
2331
You'll need the following information to configure the Postgres destination:
2432

2533
- **Host** - The host name of the server.
26-
- **Port** - The port number the server is listening on. Defaults to the PostgreSQL™ standard port number (5432).
34+
- **Port** - The port number the server is listening on. Defaults to the PostgreSQL™ standard port
35+
number (5432).
2736
- **Username**
2837
- **Password**
29-
- **Default Schema Name** - Specify the schema (or several schemas separated by commas) to be set in the search-path. These schemas will be used to resolve unqualified object names used in statements executed over this connection.
30-
- **Database** - The database name. The default is to connect to a database with the same name as the user name.
38+
- **Default Schema Name** - Specify the schema (or several schemas separated by commas) to be set in
39+
the search-path. These schemas will be used to resolve unqualified object names used in statements
40+
executed over this connection.
41+
- **Database** - The database name. The default is to connect to a database with the same name as
42+
the user name.
3143
- **JDBC URL Params** (optional)
3244

3345
[Refer to this guide for more details](https://jdbc.postgresql.org/documentation/use/#connecting-to-the-database)
@@ -64,17 +76,18 @@ synced data from Airbyte.
6476

6577
## Naming Conventions
6678

67-
From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS):
79+
From
80+
[Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS):
6881

6982
- SQL identifiers and key words must begin with a letter \(a-z, but also letters with diacritical
7083
marks and non-Latin letters\) or an underscore \(\_\).
7184
- Subsequent characters in an identifier or key word can be letters, underscores, digits \(0-9\), or
7285
dollar signs \($\).
7386

74-
Note that dollar signs are not allowed in identifiers according to the SQL standard,
75-
so their use might render applications less portable. The SQL standard will not define a key word
76-
that contains digits or starts or ends with an underscore, so identifiers of this form are safe
77-
against possible conflict with future extensions of the standard.
87+
Note that dollar signs are not allowed in identifiers according to the SQL standard, so their use
88+
might render applications less portable. The SQL standard will not define a key word that contains
89+
digits or starts or ends with an underscore, so identifiers of this form are safe against possible
90+
conflict with future extensions of the standard.
7891

7992
- The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in
8093
commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier
@@ -85,61 +98,84 @@ From [Postgres SQL Identifiers syntax](https://www.postgresql.org/docs/9.0/sql-s
8598
still applies.
8699
- Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to
87100
lower case.
88-
- In order to make your applications portable and less error-prone, use consistent quoting with each name (either always quote it or never quote it).
101+
- In order to make your applications portable and less error-prone, use consistent quoting with each
102+
name (either always quote it or never quote it).
89103

90104
:::info
91105

92-
Airbyte Postgres destination will create raw tables and schemas using the Unquoted
93-
identifiers by replacing any special characters with an underscore. All final tables and their corresponding
106+
Airbyte Postgres destination will create raw tables and schemas using the Unquoted identifiers by
107+
replacing any special characters with an underscore. All final tables and their corresponding
94108
columns are created using Quoted identifiers preserving the case sensitivity.
95109

96110
:::
97111

98112
**For Airbyte Cloud:**
99113

100114
1. [Log into your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account.
101-
2. In the left navigation bar, click **Destinations**. In the top-right corner, click **new destination**.
102-
3. On the Set up the destination page, enter the name for the Postgres connector
103-
and select **Postgres** from the Destination type dropdown.
115+
2. In the left navigation bar, click **Destinations**. In the top-right corner, click **new
116+
destination**.
117+
3. On the Set up the destination page, enter the name for the Postgres connector and select
118+
**Postgres** from the Destination type dropdown.
104119
4. Enter a name for your source.
105-
5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your Postgres database.
120+
5. For the **Host**, **Port**, and **DB Name**, enter the hostname, port number, and name for your
121+
Postgres database.
106122
6. List the **Default Schemas**.
107-
:::note
108-
The schema names are case sensitive. The 'public' schema is set by default. Multiple schemas may be used at one time. No schemas set explicitly - will sync all of existing.
109-
:::
110-
7. For **User** and **Password**, enter the username and password you created in [Step 1](#step-1-optional-create-a-dedicated-read-only-user).
111-
8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by default.
123+
124+
:::note
125+
126+
The schema names are case sensitive. The 'public' schema is set by default. Multiple schemas may be
127+
used at one time. No schemas set explicitly - will sync all of existing.
128+
129+
:::
130+
131+
7. For **User** and **Password**, enter the username and password you created in
132+
[Step 1](#step-1-optional-create-a-dedicated-read-only-user).
133+
8. For Airbyte Open Source, toggle the switch to connect using SSL. For Airbyte Cloud uses SSL by
134+
default.
112135
9. For SSL Modes, select:
113136
- **disable** to disable encrypted communication between Airbyte and the source
114137
- **allow** to enable encrypted communication only when required by the source
115138
- **prefer** to allow unencrypted communication only when the source doesn't support encryption
116-
- **require** to always require encryption. Note: The connection will fail if the source doesn't support encryption.
117-
- **verify-ca** to always require encryption and verify that the source has a valid SSL certificate
139+
- **require** to always require encryption. Note: The connection will fail if the source doesn't
140+
support encryption.
141+
- **verify-ca** to always require encryption and verify that the source has a valid SSL
142+
certificate
118143
- **verify-full** to always require encryption and verify the identity of the source
119-
10. To customize the JDBC connection beyond common options, specify additional supported [JDBC URL parameters](https://jdbc.postgresql.org/documentation/head/connect.html) as key-value pairs separated by the symbol & in the **JDBC URL Parameters (Advanced)** field.
144+
10. To customize the JDBC connection beyond common options, specify additional supported
145+
[JDBC URL parameters](https://jdbc.postgresql.org/documentation/head/connect.html) as key-value
146+
pairs separated by the symbol & in the **JDBC URL Parameters (Advanced)** field.
120147

121148
Example: key1=value1&key2=value2&key3=value3
122149

123-
These parameters will be added at the end of the JDBC URL that the AirByte will use to connect to your Postgres database.
150+
These parameters will be added at the end of the JDBC URL that the AirByte will use to connect
151+
to your Postgres database.
152+
153+
The connector now supports `connectTimeout` and defaults to 60 seconds. Setting connectTimeout
154+
to 0 seconds will set the timeout to the longest time available.
124155

125-
The connector now supports `connectTimeout` and defaults to 60 seconds. Setting connectTimeout to 0 seconds will set the timeout to the longest time available.
156+
**Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by
157+
Airbyte: `currentSchema`, `user`, `password`, `ssl`, and `sslmode`.
126158

127-
**Note:** Do not use the following keys in JDBC URL Params field as they will be overwritten by Airbyte:
128-
`currentSchema`, `user`, `password`, `ssl`, and `sslmode`.
159+
:::warning
129160

130-
:::warning
131-
This is an advanced configuration option. Users are advised to use it with caution.
132-
:::
161+
This is an advanced configuration option. Users are advised to use it with caution.
162+
163+
:::
133164

134165
11. For SSH Tunnel Method, select:
135166

136167
- **No Tunnel** for a direct connection to the database
137-
- **SSH Key Authentication** to use an RSA Private as your secret for establishing the SSH tunnel
168+
- **SSH Key Authentication** to use an RSA Private as your secret for establishing the SSH
169+
tunnel
138170
- **Password Authentication** to use a password as your secret for establishing the SSH tunnel
139171

140-
:::warning
141-
Since Airbyte Cloud requires encrypted communication, select **SSH Key Authentication** or **Password Authentication** if you selected **disable**, **allow**, or **prefer** as the **SSL Mode**; otherwise, the connection will fail.
142-
:::
172+
:::warning
173+
174+
Since Airbyte Cloud requires encrypted communication, select **SSH Key Authentication** or
175+
**Password Authentication** if you selected **disable**, **allow**, or **prefer** as the **SSL
176+
Mode**; otherwise, the connection will fail.
177+
178+
:::
143179

144180
12. Click **Set up destination**.
145181

@@ -159,22 +195,23 @@ following[ sync modes](https://docs.airbyte.com/cloud/core-concepts#connection-s
159195

160196
### Output Schema (Raw Tables)
161197

162-
Each stream will be mapped to a separate raw table in Postgres. The default schema in which the raw tables are
163-
created is `airbyte_internal`. This can be overridden in the configuration.
164-
Each table will contain 3 columns:
198+
Each stream will be mapped to a separate raw table in Postgres. The default schema in which the raw
199+
tables are created is `airbyte_internal`. This can be overridden in the configuration. Each table
200+
will contain 3 columns:
165201

166202
- `_airbyte_raw_id`: a uuid assigned by Airbyte to each event that is processed. The column type in
167203
Postgres is `VARCHAR`.
168204
- `_airbyte_extracted_at`: a timestamp representing when the event was pulled from the data source.
169205
The column type in Postgres is `TIMESTAMP WITH TIME ZONE`.
170-
- `_airbyte_loaded_at`: a timestamp representing when the row was processed into final table.
171-
The column type in Postgres is `TIMESTAMP WITH TIME ZONE`.
172-
- `_airbyte_data`: a json blob representing with the event data. The column type in Postgres
173-
is `JSONB`.
206+
- `_airbyte_loaded_at`: a timestamp representing when the row was processed into final table. The
207+
column type in Postgres is `TIMESTAMP WITH TIME ZONE`.
208+
- `_airbyte_data`: a json blob representing with the event data. The column type in Postgres is
209+
`JSONB`.
174210

175211
### Final Tables Data type mapping
212+
176213
| Airbyte Type | Postgres Type |
177-
|:---------------------------|:-------------------------|
214+
| :------------------------- | :----------------------- |
178215
| string | VARCHAR |
179216
| number | DECIMAL |
180217
| integer | BIGINT |
@@ -197,7 +234,7 @@ Now that you have set up the Postgres destination connector, check out the follo
197234
## Changelog
198235

199236
| Version | Date | Pull Request | Subject |
200-
|:--------|:-----------|:-----------------------------------------------------------|:----------------------------------------------------------------------------------------------------|
237+
| :------ | :--------- | :--------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- |
201238
| 2.0.4 | 2024-03-07 | [\#35899](https://github.com/airbytehq/airbyte/pull/35899) | Adopt CDK 0.23.18; Null safety check in state parsing |
202239
| 2.0.3 | 2024-03-01 | [\#35528](https://github.com/airbytehq/airbyte/pull/35528) | Adopt CDK 0.23.11; Use Migration framework |
203240
| 2.0.2 | 2024-03-01 | [\#35760](https://github.com/airbytehq/airbyte/pull/35760) | Mark as certified, add PSQL exception to deinterpolator |

0 commit comments

Comments
 (0)