Skip to content

Commit b88ed51

Browse files
authored
🚨🚨 Source SFTP Bulk: migrate to file-based CDK (#36256)
Signed-off-by: Artem Inzhyyants <[email protected]>
1 parent 2ccd8c1 commit b88ed51

39 files changed

+3580
-1006
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[run]
2+
omit =
3+
source_sftp_bulk/run.py

airbyte-integrations/connectors/source-sftp-bulk/Dockerfile

-17
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,55 @@
1-
# SFTP Bulk Source
1+
# Sftp-Bulk source connector
22

3-
This is the repository for the FTP source connector, written in Python, that helps you bulk ingest files with the same data format from an FTP server into a single stream.
4-
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/sftp-bulk).
3+
4+
This is the repository for the Sftp-Bulk source connector, written in Python.
5+
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/sftp-bulk).
56

67
## Local development
78

89
### Prerequisites
9-
**To iterate on this connector, make sure to complete this prerequisites section.**
10-
11-
#### Minimum Python version required `= 3.9.0`
10+
* Python (~=3.9)
11+
* Poetry (~=1.7) - installation instructions [here](https://python-poetry.org/docs/#installation)
1212

13-
#### Build & Activate Virtual Environment and install dependencies
14-
From this connector directory, create a virtual environment:
15-
```
16-
python -m venv .venv
17-
```
1813

19-
This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
20-
development environment of choice. To activate it from the terminal, run:
21-
```
22-
source .venv/bin/activate
23-
pip install -r requirements.txt
14+
### Installing the connector
15+
From this connector directory, run:
16+
```bash
17+
poetry install --with dev
2418
```
25-
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.
2619

27-
Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
28-
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
29-
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
30-
should work as you expect.
3120

32-
#### Create credentials
33-
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/sftp-bulk)
34-
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_sftp_bulk/spec.json` file.
35-
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
36-
See `integration_tests/sample_config.json` for a sample config file.
21+
### Create credentials
22+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/sftp-bulk)
23+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_sftp_bulk/spec.yaml` file.
24+
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
25+
See `sample_files/sample_config.json` for a sample config file.
3726

38-
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source ftp test creds`
39-
and place them into `secrets/config.json`.
4027

4128
### Locally running the connector
4229
```
43-
python main.py spec
44-
python main.py check --config secrets/config.json
45-
python main.py discover --config secrets/config.json
46-
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
30+
poetry run source-sftp-bulk spec
31+
poetry run source-sftp-bulk check --config secrets/config.json
32+
poetry run source-sftp-bulk discover --config secrets/config.json
33+
poetry run source-sftp-bulk read --config secrets/config.json --catalog sample_files/configured_catalog.json
4734
```
4835

49-
### Locally running the connector docker image
50-
36+
### Running unit tests
37+
To run unit tests locally, from the connector directory run:
38+
```
39+
poetry run pytest unit_tests
40+
```
5141

52-
#### Build
53-
**Via [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) (recommended):**
42+
### Building the docker image
43+
1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
44+
2. Run the following command to build the docker image:
5445
```bash
5546
airbyte-ci connectors --name=source-sftp-bulk build
5647
```
5748

58-
An image will be built with the tag `airbyte/source-sftp-bulk:dev`.
49+
An image will be available on your host with the tag `airbyte/source-sftp-bulk:dev`.
5950

60-
**Via `docker build`:**
61-
```bash
62-
docker build -t airbyte/source-sftp-bulk:dev .
63-
```
6451

65-
#### Run
52+
### Running as a docker container
6653
Then run any of the connector commands as follows:
6754
```
6855
docker run --rm airbyte/source-sftp-bulk:dev spec
@@ -71,29 +58,34 @@ docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-sftp-bulk:dev discover
7158
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-sftp-bulk:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
7259
```
7360

74-
## Testing
61+
### Running our CI test suite
7562
You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
7663
```bash
7764
airbyte-ci connectors --name=source-sftp-bulk test
7865
```
7966

8067
### Customizing acceptance Tests
81-
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
68+
Customize `acceptance-test-config.yml` file to configure acceptance tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
8269
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
8370

84-
## Dependency Management
85-
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
86-
We split dependencies between two groups, dependencies that are:
87-
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
88-
* required for the testing need to go to `TEST_REQUIREMENTS` list
71+
### Dependency Management
72+
All of your dependencies should be managed via Poetry.
73+
To add a new dependency, run:
74+
```bash
75+
poetry add <package-name>
76+
```
77+
78+
Please commit the changes to `pyproject.toml` and `poetry.lock` files.
8979

90-
### Publishing a new version of the connector
80+
## Publishing a new version of the connector
9181
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
9282
1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-sftp-bulk test`
93-
2. Bump the connector version in `metadata.yaml`: increment the `dockerImageTag` value. Please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors).
83+
2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
84+
- bump the `dockerImageTag` value in in `metadata.yaml`
85+
- bump the `version` value in `pyproject.toml`
9486
3. Make sure the `metadata.yaml` content is up to date.
95-
4. Make the connector documentation and its changelog is up to date (`docs/integrations/sources/sftp-bulk.md`).
87+
4. Make sure the connector documentation and its changelog is up to date (`docs/integrations/sources/sftp-bulk.md`).
9688
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
9789
6. Pat yourself on the back for being an awesome contributor.
9890
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
99-
91+
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,20 @@
11
# See [Connector Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/connector-acceptance-tests-reference)
22
# for more information about how to configure these tests
33
connector_image: airbyte/source-sftp-bulk:dev
4-
tests:
4+
acceptance_tests:
55
spec:
6-
- spec_path: "source_sftp_bulk/spec.json"
7-
timeout_seconds: 60
6+
tests:
7+
- spec_path: "integration_tests/spec.json"
8+
timeout_seconds: 60
9+
backward_compatibility_tests_config:
10+
disable_for_version: 0.3.2 # `start_date` format changed to format: date-time
811
connection:
9-
- config_path: "integration_tests/valid_config.json"
10-
status: "succeed"
11-
timeout_seconds: 60
12-
- config_path: "integration_tests/invalid_config.json"
13-
status: "failed"
14-
timeout_seconds: 60
12+
bypass_reason: "This connector uses integration tests"
1513
discovery:
16-
- config_path: "integration_tests/valid_config.json"
14+
bypass_reason: "This connector uses integration tests"
1715
basic_read:
18-
- config_path: "integration_tests/valid_config.json"
19-
configured_catalog_path: "integration_tests/configured_catalog.json"
20-
empty_streams: []
21-
incremental:
22-
- config_path: "integration_tests/valid_config.json"
23-
configured_catalog_path: "integration_tests/configured_catalog.json"
24-
future_state_path: "integration_tests/abnormal_state.json"
16+
bypass_reason: "This connector uses integration tests"
2517
full_refresh:
26-
- config_path: "integration_tests/valid_config.json"
27-
configured_catalog_path: "integration_tests/configured_catalog.json"
18+
bypass_reason: "This connector uses integration tests"
19+
incremental:
20+
bypass_reason: "This connector uses integration tests"

airbyte-integrations/connectors/source-sftp-bulk/integration_tests/acceptance.py

+3-9
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
#
22
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
33
#
4-
54
import os
65
import shutil
76
import time
@@ -17,25 +16,20 @@
1716

1817
@pytest.fixture(scope="session", autouse=True)
1918
def connector_setup():
20-
dir_path = os.getcwd() + "/integration_tests/files"
21-
2219
if os.path.exists(TMP_FOLDER):
2320
shutil.rmtree(TMP_FOLDER)
24-
shutil.copytree(dir_path, TMP_FOLDER)
25-
21+
shutil.copytree(f"{os.path.dirname(__file__)}/files", TMP_FOLDER)
2622
docker_client = docker.from_env()
27-
2823
container = docker_client.containers.run(
2924
"atmoz/sftp",
3025
"foo:pass",
31-
name=f"mysftpacceptance_{uuid.uuid4().hex}",
32-
ports={22: 1122},
26+
name=f"mysftp_acceptance_{uuid.uuid4().hex}",
27+
ports={22: ("0.0.0.0", 2222)},
3328
volumes={
3429
f"{TMP_FOLDER}": {"bind": "/home/foo/files", "mode": "rw"},
3530
},
3631
detach=True,
3732
)
38-
3933
time.sleep(5)
4034
yield
4135

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
{
2+
"host": "localhost",
3+
"port": 2222,
4+
"username": "foo",
5+
"credentials": {
6+
"auth_type": "password",
7+
"password": "pass"
8+
},
9+
"file_type": "json",
10+
"start_date": "2021-01-01T00:00:00.000000Z",
11+
"folder_path": "/files",
12+
"streams": [
13+
{
14+
"name": "test_stream",
15+
"file_type": "csv",
16+
"globs": ["**/test_1.csv"],
17+
"legacy_prefix": "",
18+
"validation_policy": "Emit Record",
19+
"format": {
20+
"filetype": "csv",
21+
"delimiter": ",",
22+
"quote_char": "\"",
23+
"double_quote": true,
24+
"null_values": [
25+
"",
26+
"#N/A",
27+
"#N/A N/A",
28+
"#NA",
29+
"-1.#IND",
30+
"-1.#QNAN",
31+
"-NaN",
32+
"-nan",
33+
"1.#IND",
34+
"1.#QNAN",
35+
"N/A",
36+
"NA",
37+
"NULL",
38+
"NaN",
39+
"n/a",
40+
"nan",
41+
"null"
42+
],
43+
"true_values": ["1", "True", "TRUE", "true"],
44+
"false_values": ["0", "False", "FALSE", "false"],
45+
"inference_type": "Primitive Types Only",
46+
"strings_can_be_null": false,
47+
"encoding": "utf8",
48+
"header_definition": {
49+
"header_definition_type": "From CSV"
50+
}
51+
}
52+
}
53+
]
54+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
{
2+
"host": "localhost",
3+
"port": 2222,
4+
"username": "foo",
5+
"credentials": {
6+
"auth_type": "private_key",
7+
"private_key": "key"
8+
},
9+
"file_type": "json",
10+
"start_date": "2021-01-01T00:00:00.000000Z",
11+
"folder_path": "/files",
12+
"streams": [
13+
{
14+
"name": "test_stream",
15+
"file_type": "csv",
16+
"globs": ["**/test_1.csv"],
17+
"legacy_prefix": "",
18+
"validation_policy": "Emit Record",
19+
"format": {
20+
"filetype": "csv",
21+
"delimiter": ",",
22+
"quote_char": "\"",
23+
"double_quote": true,
24+
"null_values": [
25+
"",
26+
"#N/A",
27+
"#N/A N/A",
28+
"#NA",
29+
"-1.#IND",
30+
"-1.#QNAN",
31+
"-NaN",
32+
"-nan",
33+
"1.#IND",
34+
"1.#QNAN",
35+
"N/A",
36+
"NA",
37+
"NULL",
38+
"NaN",
39+
"n/a",
40+
"nan",
41+
"null"
42+
],
43+
"true_values": ["1", "True", "TRUE", "true"],
44+
"false_values": ["0", "False", "FALSE", "false"],
45+
"inference_type": "Primitive Types Only",
46+
"strings_can_be_null": false,
47+
"encoding": "utf8",
48+
"header_definition": {
49+
"header_definition_type": "From CSV"
50+
}
51+
}
52+
}
53+
]
54+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
{
2+
"streams": [
3+
{
4+
"name": "test_stream",
5+
"file_type": "csv",
6+
"globs": ["**/test_1.csv", "**/test_3.csv"],
7+
"legacy_prefix": "",
8+
"validation_policy": "Emit Record",
9+
"format": {
10+
"filetype": "csv",
11+
"delimiter": ",",
12+
"quote_char": "\"",
13+
"double_quote": true,
14+
"null_values": [
15+
"",
16+
"#N/A",
17+
"#N/A N/A",
18+
"#NA",
19+
"-1.#IND",
20+
"-1.#QNAN",
21+
"-NaN",
22+
"-nan",
23+
"1.#IND",
24+
"1.#QNAN",
25+
"N/A",
26+
"NA",
27+
"NULL",
28+
"NaN",
29+
"n/a",
30+
"nan",
31+
"null"
32+
],
33+
"true_values": ["1", "True", "TRUE", "true"],
34+
"false_values": ["0", "False", "FALSE", "false"],
35+
"inference_type": "Primitive Types Only",
36+
"strings_can_be_null": false,
37+
"encoding": "utf8",
38+
"header_definition": {
39+
"header_definition_type": "From CSV"
40+
}
41+
}
42+
}
43+
]
44+
}

0 commit comments

Comments
 (0)