-
Notifications
You must be signed in to change notification settings - Fork 4.6k
update Snowflake destination docs with more info #10213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,9 +27,15 @@ Note that Airbyte will create **permanent** tables. If you prefer to create tran | |
|
||
## Getting started | ||
|
||
### Requirements | ||
|
||
1. Active Snowflake warehouse | ||
2. A staging S3 or GCS bucket with credentials \(for the Cloud Storage Staging strategy\). | ||
|
||
We recommend creating an Airbyte-specific warehouse, database, schema, user, and role for writing data into Snowflake so it is possible to track costs specifically related to Airbyte \(including the cost of running this warehouse\) and control permissions at a granular level. Since the Airbyte user creates, drops, and alters tables, `OWNERSHIP` permissions are required in Snowflake. If you are not following the recommended script below, please limit the `OWNERSHIP` permissions to only the necessary database and schema for the Airbyte user. | ||
|
||
We provide the following script to create these resources. Before running, you must change the password to something secure. You may change the names of the other resources if you desire. | ||
Login into your Snowflake warehouse, copy and paste the following script in a new [worksheet](https://docs.snowflake.com/en/user-guide/ui-worksheet.html). Select the `All Queries` checkbox and then press the `Run` button. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! Love that we're hyperlinking out to Snowflake docs here! |
||
|
||
```text | ||
-- set variables (these need to be uppercase) | ||
|
@@ -106,14 +112,14 @@ commit; | |
|
||
You should now have all the requirements needed to configure Snowflake as a destination in the UI. You'll need the following information to configure the Snowflake destination: | ||
|
||
* **Host** | ||
* **Role** | ||
* **Warehouse** | ||
* **Database** | ||
* **Schema** | ||
* **Username** | ||
* **Password** | ||
* **JDBC URL Params** (Optional) | ||
* **Host** : The host domain of the snowflake instance (must include the account, region, cloud environment, and end with snowflakecomputing.com). Example - `accountname.us-east-2.aws.snowflakecomputing.com` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this section of descriptions, are external docs from Snowflake that can/should be referenced? For example, with "Role", this seems like it's appropriate: https://docs.snowflake.com/en/user-guide/security-access-control-overview.html#roles. |
||
* **Role** : The role you created for Airbyte to access Snowflake. Example - `AIRBYTE_ROLE` | ||
* **Warehouse** : The warehouse you created for Airbyte to sync data into. Example - `AIRBYTE_WAREHOUSE` | ||
* **Database** : The database you created for Airbyte to sync data into. Example - `AIRBYTE_DATABASE` | ||
* **Schema** : The default schema is used as the target schema for all statements issued from the connection that do not explicitly specify a schema name. Schema name would be transformed to allowed by Snowflake if it not follow [Snowflake Naming Conventions](https://docs.airbyte.io/integrations/destinations/snowflake#notes-about-snowflake-naming-conventions). | ||
* **Username** : The username you created to allow Airbyte to access the database. Example - `AIRBYTE_USER` | ||
* **Password** : The password associated with the username. | ||
* **JDBC URL Params** (Optional) : Additional properties to pass to the JDBC URL string when connecting to the database formatted as 'key=value' pairs separated by the symbol '&'. (example: key1=value1&key2=value2&key3=value3). More info on how this works can be found [here](https://docs.snowflake.com/en/user-guide/jdbc-parameters.html) | ||
|
||
## Notes about Snowflake Naming Conventions | ||
|
||
|
@@ -147,17 +153,17 @@ When an identifier is double-quoted, it is stored and resolved exactly as entere | |
|
||
Therefore, Airbyte Snowflake destination will create tables and schemas using the Unquoted identifiers when possible or fallback to Quoted Identifiers if the names are containing special characters. | ||
|
||
## Cloud Storage Staging | ||
## Loading Method | ||
|
||
By default, Airbyte uses batches of `INSERT` commands to add data to a temporary table before copying it over to the final table in Snowflake. This is too slow for larger/multi-GB replications. For those larger replications we recommend configuring using cloud storage to allow batch writes and loading. | ||
By default, Airbyte uses `INTERNAL STAGING` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we hyperlink this to the correct section in our docs in case folks want to understand more about this? |
||
|
||
### Internal Staging | ||
|
||
Internal named stages are storage location objects within a Snowflake database/schema. Because they are database objects, the same security permissions apply as with any other database objects. No need to provide additional properties for internal staging | ||
Internal named stages are storage location objects within a Snowflake database/schema. Because they are database objects, the same security permissions apply as with any other database objects. No need to provide additional properties for internal staging. This is also the recommended way of using the connector. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a reason why we recommend this and why this is default? |
||
|
||
**Operating on a stage also requires the USAGE privilege on the parent database and schema.** | ||
|
||
### AWS S3 | ||
### AWS S3 Staging | ||
|
||
For AWS S3, you will need to create a bucket and provide credentials to access the bucket. We recommend creating a bucket that is only used for Airbyte to stage data to Snowflake. Airbyte needs read/write access to interact with this bucket. | ||
|
||
|
@@ -180,7 +186,7 @@ Optional parameters: | |
* Whether to delete the staging files from S3 after completing the sync. Specifically, the connector will create CSV files named `bucketPath/namespace/streamName/syncDate_epochMillis_randomUuid.csv` containing three columns (`ab_id`, `data`, `emitted_at`). Normally these files are deleted after the `COPY` command completes; if you want to keep them for other purposes, set `purge_staging_data` to `false`. | ||
|
||
|
||
### Google Cloud Storage \(GCS\) | ||
### Google Cloud Storage \(GCS\) Staging | ||
|
||
First you will need to create a GCS bucket. | ||
|
||
|
@@ -215,6 +221,8 @@ The final query should show a `STORAGE_GCP_SERVICE_ACCOUNT` property with an ema | |
|
||
Finally, you need to add read/write permissions to your bucket with that email. | ||
|
||
## Changelog | ||
|
||
| Version | Date | Pull Request | Subject | | ||
|:--------|:-----------| :----- | :------ | | ||
| 0.4.8 | 2022-02-01 | [\#9959](https://github.com/airbytehq/airbyte/pull/9959) | Fix null pointer exception from buffered stream consumer. | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this since everyone should use internal staging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, if a user doesn't have an S3 or GCS staging set up, they should be able to move forward simply with an active Snowflake warehouse and not changing the loading method. Is that correct? If so so, agree with Sherif's comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thats correct @misteryeo