Skip to content

Commit e2a8000

Browse files
Update docs
1 parent 665e6db commit e2a8000

File tree

3 files changed

+24
-8
lines changed

3 files changed

+24
-8
lines changed

airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/S3ConsumerFactory.java

+5-5
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,13 @@ private OnStartFunction onStartFunction(final BlobStorageOperations storageOpera
100100
if (writeConfig.getSyncMode().equals(DestinationSyncMode.OVERWRITE)) {
101101
final String namespace = writeConfig.getNamespace();
102102
final String stream = writeConfig.getStreamName();
103-
final String bucketPath = writeConfig.getOutputBucketPath();
104-
LOGGER.info("Clearing storage area in destination started for namespace {} stream {} bucketObject {}", namespace, stream, bucketPath);
103+
final String outputBucketPath = writeConfig.getOutputBucketPath();
104+
LOGGER.info("Clearing storage area in destination started for namespace {} stream {} bucketObject {}", namespace, stream, outputBucketPath);
105105
AirbyteSentry.executeWithTracing("PrepareStreamStorage",
106-
() -> storageOperations.dropBucketObject(bucketPath),
107-
Map.of("namespace", Objects.requireNonNullElse(namespace, "null"), "stream", stream, "storage", bucketPath));
106+
() -> storageOperations.dropBucketObject(outputBucketPath),
107+
Map.of("namespace", Objects.requireNonNullElse(namespace, "null"), "stream", stream, "storage", outputBucketPath));
108108
LOGGER.info("Clearing storage area in destination completed for namespace {} stream {} bucketObject {}", namespace, stream,
109-
bucketPath);
109+
outputBucketPath);
110110
}
111111
}
112112
LOGGER.info("Preparing storage area in destination completed.");

airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@
4141
"description": "Format string on how data will be organized inside the S3 bucket directory",
4242
"type": "string",
4343
"examples": [
44-
"${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_${PART_ID}"
44+
"${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_"
4545
],
4646
"order": 3
4747
},

docs/integrations/destinations/s3.md

+18-2
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,15 @@ Check out common troubleshooting issues for the S3 destination connector on our
2222
| S3 Endpoint | string | URL to S3, If using AWS S3 just leave blank. |
2323
| S3 Bucket Name | string | Name of the bucket to sync data into. |
2424
| S3 Bucket Path | string | Subdirectory under the above bucket to sync the data into. |
25-
| S3 Bucket Format | string | Additional string format under S3 Bucket Path. Default value is `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_${PART_ID}`. |
25+
| S3 Bucket Format | string | Additional string format on how to store data under S3 Bucket Path. Default value is `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_`. |
2626
| S3 Region | string | See [here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions) for all region codes. |
2727
| Access Key ID | string | AWS/Minio credential. |
2828
| Secret Access Key | string | AWS/Minio credential. |
2929
| Format | object | Format specific configuration. See the [spec](/airbyte-integrations/connectors/destination-s3/src/main/resources/spec.json) for details. |
3030

3131
⚠️ Please note that under "Full Refresh Sync" mode, data in the configured bucket and path will be wiped out before each sync. We recommend you to provision a dedicated S3 resource for this sync to prevent unexpected data deletion from misconfiguration. ⚠️
3232

33-
The full path of the output data with S3 path format `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_${PART_ID}` is:
33+
The full path of the output data with S3 path format `${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_` is:
3434

3535
```text
3636
<bucket-name>/<source-namespace-if-exists>/<stream-name>/<upload-date>_<epoch>_<partition-id>.<format-extension>
@@ -50,6 +50,22 @@ testing_bucket/data_output_path/public/users/2021_01_01_1234567890_0.csv.gz
5050
| bucket path
5151
bucket name
5252
```
53+
Available variable for custom s3 path format are:
54+
- `${NAMESPACE}`: Namespace where the stream comes from or configured by the connectionn namespace fields.
55+
- `${STREAM_NAME}`: Name of the stream
56+
- `${YEAR}`: Year in which the sync was writing the output data in.
57+
- `${MONTH}`: Month in which the sync was writing the output data in.
58+
- `${DAY}`: Day in which the sync was writing the output data in.
59+
- `${HOUR}`: Hour in which the sync was writing the output data in.
60+
- `${MINUTE}` : Minute in which the sync was writing the output data in.
61+
- `${SECOND}`: Second in which the sync was writing the output data in.
62+
- `${MILLISECOND}`: Millisecond in which the sync was writing the output data in.
63+
- `${EPOCH}`: Milliseconds since Epoch in which the sync was writing the output data in.
64+
- `${UUID}`: random uuid string
65+
66+
Note:
67+
- Multiple `/` characters in the S3 path are collapsed into a single `/` character.
68+
- If the output bucket contains too many files, the part id variable is using a `UUID` instead. It uses sequential ID otherwise.
5369

5470
Please note that the stream name may contain a prefix, if it is configured on the connection.
5571
A data sync may create multiple files as the output files can be partitioned by size (targeting a size of 200MB compressed or lower) .

0 commit comments

Comments
 (0)