Skip to content

Commit a267225

Browse files
authored
source-apify-dataset: ensure inline schemas, updated cdk, poetry (where possible) (#37115)
1 parent 0cb3b57 commit a267225

File tree

12 files changed

+1442
-375
lines changed

12 files changed

+1442
-375
lines changed

airbyte-integrations/connectors/source-apify-dataset/Dockerfile

-38
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,105 +1,91 @@
1-
# Apify Dataset Source
1+
# Apify-Dataset source connector
22

3-
This is the repository for the Apify Dataset configuration based source connector.
3+
4+
This is the repository for the Apify-Dataset source connector, written in Python.
45
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/apify-dataset).
56

67
## Local development
78

8-
#### Building via Python
9-
10-
Create a Python virtual environment
11-
12-
```
13-
virtualenv --python $(which python3.10) .venv
14-
```
9+
### Prerequisites
10+
* Python (~=3.9)
11+
* Poetry (~=1.7) - installation instructions [here](https://python-poetry.org/docs/#installation)
1512

16-
Source it
1713

18-
```
19-
source .venv/bin/activate
14+
### Installing the connector
15+
From this connector directory, run:
16+
```bash
17+
poetry install --with dev
2018
```
2119

22-
Check connector specifications/definition
2320

24-
```
25-
python main.py spec
26-
```
21+
### Create credentials
22+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/apify-dataset)
23+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_apify_dataset/spec.yaml` file.
24+
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
25+
See `sample_files/sample_config.json` for a sample config file.
2726

28-
Basic check - check connection to the API
2927

28+
### Locally running the connector
3029
```
31-
python main.py check --config secrets/config.json
30+
poetry run source-apify-dataset spec
31+
poetry run source-apify-dataset check --config secrets/config.json
32+
poetry run source-apify-dataset discover --config secrets/config.json
33+
poetry run source-apify-dataset read --config secrets/config.json --catalog sample_files/configured_catalog.json
3234
```
3335

34-
Integration tests - read operation from the API
35-
36+
### Running unit tests
37+
To run unit tests locally, from the connector directory run:
3638
```
37-
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
39+
poetry run pytest unit_tests
3840
```
3941

40-
#### Create credentials
41-
42-
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/apify-dataset)
43-
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_apify_dataset/spec.yaml` file.
44-
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
45-
See `integration_tests/sample_config.json` for a sample config file.
46-
47-
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source apify-dataset test creds`
48-
and place them into `secrets/config.json`.
49-
50-
### Locally running the connector docker image
51-
52-
53-
#### Build
54-
**Via [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md) (recommended):**
42+
### Building the docker image
43+
1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
44+
2. Run the following command to build the docker image:
5545
```bash
5646
airbyte-ci connectors --name=source-apify-dataset build
5747
```
5848

59-
An image will be built with the tag `airbyte/source-apify-dataset:dev`.
49+
An image will be available on your host with the tag `airbyte/source-apify-dataset:dev`.
6050

61-
**Via `docker build`:**
62-
```bash
63-
docker build -t airbyte/source-apify-dataset:dev .
64-
```
65-
66-
#### Run
6751

52+
### Running as a docker container
6853
Then run any of the connector commands as follows:
69-
7054
```
7155
docker run --rm airbyte/source-apify-dataset:dev spec
7256
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-apify-dataset:dev check --config /secrets/config.json
7357
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-apify-dataset:dev discover --config /secrets/config.json
7458
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-apify-dataset:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
7559
```
7660

77-
78-
## Testing
61+
### Running our CI test suite
7962
You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
8063
```bash
8164
airbyte-ci connectors --name=source-apify-dataset test
8265
```
8366

8467
### Customizing acceptance Tests
85-
Customize `acceptance-test-config.yml` file to configure tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
68+
Customize `acceptance-test-config.yml` file to configure acceptance tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
8669
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
8770

88-
## Dependency Management
89-
90-
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
91-
We split dependencies between two groups, dependencies that are:
71+
### Dependency Management
72+
All of your dependencies should be managed via Poetry.
73+
To add a new dependency, run:
74+
```bash
75+
poetry add <package-name>
76+
```
9277

93-
- required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
94-
- required for the testing need to go to `TEST_REQUIREMENTS` list
78+
Please commit the changes to `pyproject.toml` and `poetry.lock` files.
9579

96-
### Publishing a new version of the connector
80+
## Publishing a new version of the connector
9781
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
9882
1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-apify-dataset test`
99-
2. Bump the connector version in `metadata.yaml`: increment the `dockerImageTag` value. Please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors).
83+
2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
84+
- bump the `dockerImageTag` value in in `metadata.yaml`
85+
- bump the `version` value in `pyproject.toml`
10086
3. Make sure the `metadata.yaml` content is up to date.
101-
4. Make the connector documentation and its changelog is up to date (`docs/integrations/sources/apify-dataset.md`).
87+
4. Make sure the connector documentation and its changelog is up to date (`docs/integrations/sources/apify-dataset.md`).
10288
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
10389
6. Pat yourself on the back for being an awesome contributor.
10490
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
105-
91+
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.

airbyte-integrations/connectors/source-apify-dataset/metadata.yaml

+18-13
Original file line numberDiff line numberDiff line change
@@ -2,36 +2,41 @@ data:
22
allowedHosts:
33
hosts:
44
- api.apify.com
5-
remoteRegistries:
6-
pypi:
7-
enabled: true
8-
packageName: airbyte-source-apify-dataset
9-
registries:
10-
oss:
11-
enabled: true
12-
cloud:
13-
enabled: true
5+
connectorBuildOptions:
6+
baseImage: docker.io/airbyte/python-connector-base:1.2.0@sha256:c22a9d97464b69d6ef01898edf3f8612dc11614f05a84984451dde195f337db9
147
connectorSubtype: api
158
connectorType: source
169
definitionId: 47f17145-fe20-4ef5-a548-e29b048adf84
17-
dockerImageTag: 2.1.1
10+
dockerImageTag: 2.1.5
1811
dockerRepository: airbyte/source-apify-dataset
12+
documentationUrl: https://docs.airbyte.com/integrations/sources/apify-dataset
1913
githubIssueLabel: source-apify-dataset
2014
icon: apify.svg
2115
license: MIT
2216
name: Apify Dataset
17+
registries:
18+
cloud:
19+
enabled: true
20+
oss:
21+
enabled: true
2322
releaseDate: 2023-08-25
2423
releaseStage: alpha
2524
releases:
2625
breakingChanges:
2726
1.0.0:
27+
message: Update spec to use token and ingest all 3 streams correctly
2828
upgradeDeadline: 2023-08-30
29-
message: "Update spec to use token and ingest all 3 streams correctly"
3029
2.0.0:
30+
message:
31+
This version introduces a new Item Collection (WCC) stream as a substitute
32+
of the now-removed Item Collection stream in order to retain data for Web-Content-Crawler
33+
datasets.
3134
upgradeDeadline: 2023-09-18
32-
message: "This version introduces a new Item Collection (WCC) stream as a substitute of the now-removed Item Collection stream in order to retain data for Web-Content-Crawler datasets."
35+
remoteRegistries:
36+
pypi:
37+
enabled: true
38+
packageName: airbyte-source-apify-dataset
3339
supportLevel: community
34-
documentationUrl: https://docs.airbyte.com/integrations/sources/apify-dataset
3540
tags:
3641
- language:python
3742
- cdk:low-code

0 commit comments

Comments
 (0)