Skip to content

Commit aa8c125

Browse files
docs: Update SFTP-bulk (#42989)
Co-authored-by: Alexandre Girard <[email protected]>
1 parent ed12124 commit aa8c125

File tree

1 file changed

+25
-28
lines changed

1 file changed

+25
-28
lines changed

docs/integrations/sources/sftp-bulk.md

+25-28
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
This page contains the setup guide and reference information for the SFTP Bulk source connector.
44

5-
This connector provides the following features not found in the standard SFTP source connector:
5+
The SFTP Bulk connector offers several features that are not available in the standard SFTP source connector:
66

77
- **Bulk ingestion of files**: This connector can consolidate and process multiple files as a single data stream in your destination system.
88
- **Incremental loading**: This connector supports incremental loading, allowing you to sync files from the SFTP server to your destination based on their creation or last modification time.
@@ -23,7 +23,7 @@ To set up the SFTP connector, you will need to select at least _one_ of the foll
2323
- Your username and password credentials associated with the server.
2424
- A private/public key pair.
2525

26-
To set up key pair authentication, you may use the following steps as a guide:
26+
To set up key pair authentication, follow these steps:
2727

2828
1. Open your terminal or command prompt and use the `ssh-keygen` command to generate a new key pair.
2929
:::note
@@ -56,25 +56,16 @@ For more information on SSH key pair authentication, please refer to the
5656

5757
1. [Log in to your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account, or navigate to your Airbyte Open Source dashboard.
5858
2. In the left navigation bar, click **Sources**. In the top-right corner, click **+ New source**.
59-
3. Find and select **SFTP** from the list of available sources.
60-
<!-- env:cloud -->
61-
**For Airbyte Cloud users**: If you do not see the **SFTP Bulk** source listed, please make sure the **Alpha** checkbox at the top of the page is checked.
62-
<!-- /env:cloud -->
59+
3. Find and select **SFTP Bulk** from the list of available sources.
6360
4. Enter a **Source name** of your choosing.
64-
5. Enter your **Username**, as well as the **Host Address** and **Port**. The default port for SFTP is 22. If your remote server is using a different port, please enter it here.
65-
6. Enter your authentication credentials for the SFTP server (**Password** or **Private Key**). If you are authenticating with a private key, you can upload the file containing the private key (usually named `rsa_id`) using the Upload file button.
66-
7. Enter a **Stream Name**. This will be the name of the stream that will be outputted to your destination.
67-
8. Use the dropdown menu to select the **File Type** you wish to sync. Currently, only CSV and JSON formats are supported.
68-
9. Provide a **Start Date** using the provided datepicker, or by programmatically entering the date in the format `YYYY-MM-DDT00:00:00Z`. Incremental syncs will only sync files modified/added after this date.
69-
10. If you wish to configure additional optional settings, please refer to the next section. Otherwise, click **Set up source** and wait for the tests to complete.
70-
71-
## Optional fields
72-
73-
The **Optional fields** can be used to further configure the SFTP source connector. If you do not wish to set additional configurations, these fields can be left at their default settings.
74-
75-
1. **CSV Separator**: If you selected `csv` as the file type, you can use this field to specify a custom separator. The default value is `,`.
76-
77-
2. **Folder Path**: Enter a folder path to specify the directory on the remote server to be synced. For example, given the file structure:
61+
5. Enter the **Host Address**.
62+
6. Enter your **Username**
63+
7. Enter your authentication credentials for the SFTP server (**Password** or **Private Key**). If you are authenticating with a private key, you can upload the file containing the private key (usually named `rsa_id`) using the Upload file button.
64+
8. In the section titled "The list of streams to sync", enter a **Stream Name**. This will be the name of the stream that will be created in your destination. Add additional streams by clicking "Add".
65+
9. For each stream, select in the dropdown menu the **File Type** you wish to sync. Depending on the format chosen, you'll see a set of options specific to the file type. You can read more about specifics to each file type below.
66+
12. (Optional) Provide a **Start Date** using the provided datepicker, or by entering the date in the format `YYYY-MM-DDTHH:mm:ss.SSSSSSZ`. Incremental syncs will only sync files modified/added after this date.
67+
13. (Optional) Specify the **Host Address**. The default port for SFTP is 2​2. If your remote server is using a different port, enter it here.
68+
(Optional) Determine the **Folder Path**. This determines the directory to search for files in, and defaults to "/". If you prefer to specify a specific folder path, specify the directory on the remote server to be synced. For example, given the file structure:
7869

7970
```
8071
Root
@@ -87,17 +78,25 @@ Root
8778
| | - 2022
8879
```
8980

90-
An input of `/logs/2022` will only replicate data contained within the specified folder, ignoring the `/files` and `/logs/2021` folders. Leaving this field blank will replicate all applicable files in the remote server's designated entry point.
91-
92-
3. **File Pattern**: Enter a [regular expression](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) to specify a naming pattern for the files to be replicated. Consider the following example:
81+
An input of `/logs/2022` will only replicate data contained within the specified folder, ignoring the `/files` and `/logs/2021` folders. Leaving this field blank will replicate all applicable files in the remote server's designated entry point. You may choose to enter a [regular expression](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html) to specify a naming pattern for the files to be replicated. Consider the following example:
9382

9483
```
9584
log-([0-9]{4})([0-9]{2})([0-9]{2})
9685
```
9786

9887
This pattern will filter for files that match the format `log-YYYYMMDD`, where `YYYY`, `MM`, and `DD` represented four-digit, two-digit, and two-digit numbers, respectively. For example, `log-20230713`. Leaving this field blank will replicate all files not filtered by the previous two fields.
9988

100-
4. **Most Recent File**: Toggle this option if you only want to sync the most recent file located in the folder path. This may be useful when dealing with data sources that generate frequent updates, such as log files or real-time data feeds. Set to False by default.
89+
14. Click **Set up source** to complete setup. A test will run to verify the configuration.
90+
91+
#### File-specific Configuration
92+
93+
Depending on your **File Type** selection, you will be presented with a few configuration options specific to that file type.
94+
95+
For JSONL, Parquet, and Document File Type formats, you can specify the **Glob** pattern used to specify which files should be selected from the file system. If your provided Folder Path already ends in a slash, you need to add that double slash to the glob where appropriate.
96+
97+
For example, assuming your folder path is not set in the connector configuration and your files are located in the root folder, use a glob pattern like `//my_prefix_*.csv` to specify your file. If your files are in a folder, include the folder in your glob pattern, like `//my_folder/my_prefix_*.csv`.
98+
99+
If your files are in a folder, include the folder in your glob pattern, like `my_folder/my_prefix_*.csv`.
101100

102101
## Supported sync modes
103102

@@ -109,12 +108,10 @@ The SFTP Bulk source connector supports the following [sync modes](https://docs.
109108
| Full Refresh - Append Sync || |
110109
| Incremental - Append || |
111110
| Incremental - Append + Deduped || |
112-
| Namespaces || |
113111

114112
## Supported streams
115113

116-
This source provides a single stream per file with a dynamic schema. The current supported type files are CSV and JSON.
117-
More formats \(e.g. Apache Avro\) will be supported in the future.
114+
This source provides a single stream per file with a dynamic schema. The current supported type files are Avro, CSV, JSONL, Parquet, and Document File Type Format.
118115

119116
## Changelog
120117
<details>
@@ -128,4 +125,4 @@ More formats \(e.g. Apache Avro\) will be supported in the future.
128125
| 0.1.1 | 2023-03-17 | [24180](https://github.com/airbytehq/airbyte/pull/24180) | Fix field order |
129126
| 0.1.0 | 2021-24-05 | [17691](https://github.com/airbytehq/airbyte/pull/17691) | Initial version |
130127

131-
</details>
128+
</details>

0 commit comments

Comments
 (0)