-
Notifications
You must be signed in to change notification settings - Fork 4.5k
🎉 GCS destination: use serialized buffer; compress csv & jsonl #11686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/test connector=connectors/destination-s3
|
/test connector=connectors/destination-gcs
|
/test connector=connectors/destination-gcs
|
/test connector=connectors/destination-s3
|
@ChristopheDuong, this is the prerequisite for BigQuery changes. It updates many of the GCS classes to be compatible with S3 to directly reuse the S3 code. It also completes the migration for the GCS destination. |
...rs/destination-gcs/src/main/java/io/airbyte/integrations/destination/gcs/GcsDestination.java
Outdated
Show resolved
Hide resolved
/test connector=connectors/destination-s3
|
/test connector=connectors/destination-gcs
|
Will publish the connector in a follow up PR. |
Who decided this was a good idea for an ETL tool to output to gz. How are tools using this data afterwards meant to process it
|
@wallies, thank you for raising this question and creating this issue. We made the decision to compress CSV and JSONL formats based on the common use case of these blob storages. People usually use S3 and GCS just to archive their data, and the compression can reduce the storage cost. I admit that this is not friendly for other use cases. I have created an issue here. Should have a new version that provides an option to not compress these formats by the end of this week or early next week. |
Much appreciated @tuliren. I also raised this #11872 as we were seeing no file extension at all on any new files |
@wallies, the new version with an option to not compress CSV and JSONL files has been published. Please give the new version a try. Thanks~~ |
What
Config class
BlobStorageCredentialConfig<ConfigType>
S3CredentialConfig
, whoseConfigType
isS3CredentialType
.GcsCredentialConfig
, whoseConfigType
isGcsCredentialType
.S3AccessKeyCredentialConfig
S3InstanceProfileCredentialConfig
GcsHmacKeyCredentialConfig
Recommended reading order
GcsDestination.java
GcsDestinatioConfig.java
S3DestinationConfig.java
🚨 User Impact 🚨
abc.csv
, it will becomeabc.csv.gz
after this PR.Pre-merge Checklist
Expand the relevant checklist and delete the others.
Updating a connector
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampleAirbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing/publish
command described here