Skip to content

Commit 66a3082

Browse files
authored
📚 CDK: Add python destination tutorial (#4800)
1 parent a4bb304 commit 66a3082

File tree

5 files changed

+215
-12
lines changed

5 files changed

+215
-12
lines changed

airbyte-integrations/connectors/destination-kvdb/destination_kvdb/spec.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
{
22
"documentationUrl": "https://docs.airbyte.io/integrations/destinations/kvdb",
3-
"supported_destination_sync_modes": ["overwrite", "append", "append_dedupe"],
3+
"supported_destination_sync_modes": ["overwrite", "append"],
4+
"supportsIncremental": true,
5+
"supportsDBT": false,
6+
"supportsNormalization": false,
47
"connectionSpecification": {
58
"$schema": "http://json-schema.org/draft-07/schema#",
69
"title": "Destination Kvdb",

airbyte-integrations/connectors/source-facebook-marketing/source_facebook_marketing/streams.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,9 +228,9 @@ def read_records(
228228
@staticmethod
229229
def clear_urls(record: MutableMapping[str, Any]) -> MutableMapping[str, Any]:
230230
"""Some URLs has random values, these values doesn't affect validity of URLs, but breaks SAT"""
231-
thumbnail_url = record.get('thumbnail_url')
231+
thumbnail_url = record.get("thumbnail_url")
232232
if thumbnail_url:
233-
record['thumbnail_url'] = remove_params_from_url(thumbnail_url, ['_nc_hash', 'd'])
233+
record["thumbnail_url"] = remove_params_from_url(thumbnail_url, ["_nc_hash", "d"])
234234
return record
235235

236236
@backoff_policy

docs/contributing-to-airbyte/building-new-connector/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ To build a new connector in Java or Python, we provide templates so you don't ne
88

99
## Connector-Development Kit (CDK)
1010

11-
You can build a source connector very quickly with the [Airbyte CDK](../python/README.md), which generates 75% of the code required for you. The CDK does not currently support creating destinations, but it will soon.
11+
You can build a connector very quickly with the [Airbyte CDK](../python/README.md), which generates 75% of the code required for you.
1212

1313

1414
## The Airbyte specification
@@ -54,14 +54,17 @@ and choose the relevant template. This will generate a new connector in the `air
5454

5555
Search the generated directory for "TODO"s and follow them to implement your connector. For more detailed walkthroughs and instructions, follow the relevant tutorial:
5656

57-
* [Building a Python source connector tutorial](../tutorials/building-a-python-source.md)
58-
* [Building a Java destination connector tutorial](../tutorials/building-a-java-destination.md)
57+
* [Building a Python source ](../tutorials/building-a-python-source.md)
58+
* [Building a Python destination](../tutorials/building-a-python-destination.md)
59+
* [Building a Java destination ](../tutorials/building-a-java-destination.md)
5960

6061
As you implement your connector, make sure to review the [Best Practices for Connector Development](best-practices.md) guide. Following best practices is not a requirement for merging your contribution to Airbyte, but it certainly doesn't hurt ;\)
6162

6263
### 2. Integration tests
6364

64-
At a minimum, your connector must implement the standard tests described in [Testing Connectors](testing-connectors.md)
65+
At a minimum, your connector must implement the acceptance tests described in [Testing Connectors](testing-connectors.md)
66+
67+
**Note: Acceptance tests are not yet available for Python destination connectors. Coming [soon](https://github.com/airbytehq/airbyte/issues/4698)!**
6568

6669
### 3. Document building & testing your connector
6770

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Building a Python Destination
2+
3+
## Summary
4+
5+
This article provides a checklist for how to create a Python destination. Each step in the checklist has a link to a more detailed explanation below.
6+
7+
## Requirements
8+
9+
Docker and Python with the versions listed in the [tech stack section](../../understanding-airbyte/tech-stack.md). You can use any Python version between 3.7 and 3.9, but this tutorial was tested with 3.7.
10+
11+
## Checklist
12+
13+
### Creating a destination
14+
15+
* Step 1: Create the destination using the template generator
16+
* Step 2: Setup the virtual environment
17+
* Step 3: Implement `spec` to define the configuration required to run the connector
18+
* Step 4: Implement `check` to provide a way to validate configurations provided to the connector
19+
* Step 5: Implement `write` to write data to the destination
20+
* Step 6: Set up Acceptance Tests
21+
* Step 7: Write unit tests or integration tests
22+
* Step 8: Update the docs \(in `docs/integrations/destinations/<destination-name>.md`\)
23+
24+
{% hint style="info" %}
25+
If you need help with any step of the process, feel free to submit a PR with your progress and any questions you have, or ask us on [slack](https://slack.airbyte.io). Also reference the KvDB python destination implementation if you want to see an example of a working destination.
26+
{% endhint %}
27+
28+
## Explaining Each Step
29+
30+
### Step 1: Create the destination using the template
31+
32+
Airbyte provides a code generator which bootstraps the scaffolding for our connector.
33+
34+
```bash
35+
$ cd airbyte-integrations/connector-templates/generator # assumes you are starting from the root of the Airbyte project.
36+
$ ./generate.sh
37+
```
38+
39+
Select the `Python Destination` template and then input the name of your connector. We'll refer to the destination as `destination-<name>` in this tutorial, but you should replace `<name>` with the actual name you used for your connector e.g: `redis` or `google-sheets`.
40+
41+
### Step 2: Setup the dev environment
42+
43+
Setup your Python virtual environment:
44+
45+
```bash
46+
cd airbyte-integrations/connectors/destination-<name>
47+
48+
# Create a virtual environment in the .venv directory
49+
python -m venv .venv
50+
51+
# activate the virtualenv
52+
source .venv/bin/activate
53+
54+
# Install with the "tests" extra which provides test requirements
55+
pip install '.[tests]'
56+
```
57+
This step sets up the initial python environment. **All** subsequent `python` or `pip` commands assume you have activated your virtual environment.
58+
59+
If you want your IDE to auto complete and resolve dependencies properly, point it at the python binary in `airbyte-integrations/connectors/destination-<name>/.venv/bin/python`. Also anytime you change the dependencies in the `setup.py` make sure to re-run the build command. The build system will handle installing all dependencies in the `setup.py` into the virtual environment.
60+
61+
Let's quickly get a few housekeeping items out of the way.
62+
63+
#### Dependencies
64+
65+
Python dependencies for your destination should be declared in `airbyte-integrations/connectors/destination-<name>/setup.py` in the `install_requires` field. You might notice that a couple of Airbyte dependencies are already declared there (mainly the Airbyte CDK and potentially some testing libraries or helpers). Keep those as they will be useful during development.
66+
67+
You may notice that there is a `requirements.txt` in your destination's directory as well. Do not touch this. It is autogenerated and used to install local Airbyte dependencies which are not published to PyPI. All your dependencies should be declared in `setup.py`.
68+
69+
#### Iterating on your implementation
70+
71+
Pretty much all it takes to create a destination is to implement the `Destination` interface. Let's briefly recap the three methods implemented by a Destination:
72+
73+
1. `spec`: declares the user-provided credentials or configuration needed to run the connector
74+
2. `check`: tests if the user-provided configuration can be used to connect to the underlying data destination, and with the correct write permissions
75+
3. `write`: writes data to the underlying destination by reading a configuration, a stream of records from stdin, and a configured catalog describing the schema of the data and how it should be written to the destination
76+
77+
The destination interface is described in detail in the [Airbyte Specification](../../understanding-airbyte/airbyte-specification.md) reference.
78+
79+
The generated files fill in a lot of information for you and have docstrings describing what you need to do to implement each method. The next few steps are just implementing that interface.
80+
81+
{% hint style="info" %}
82+
All logging should be done through the `self.logger` object available in the `Destination` class. Otherwise, logs will not be shown properly in the Airbyte UI.
83+
{% endhint %}
84+
85+
Everyone develops differently but here are 3 ways that we recommend iterating on a destination. Consider using whichever one matches your style.
86+
87+
**Run the destination using Python**
88+
89+
You'll notice in your destination's directory that there is a python file called `main.py`. This file is the entrypoint for the connector:
90+
91+
```bash
92+
# from airbyte-integrations/connectors/destination-<name>
93+
python main.py spec
94+
python main.py check --config secrets/config.json
95+
# messages.jsonl should contain AirbyteMessages (described in the Airbyte spec)
96+
cat messages.jsonl | python main.py write --config secrets/config.json --catalog sample_files/configured_catalog.json
97+
```
98+
99+
The nice thing about this approach is that you can iterate completely within in python. The downside is that you are not quite running your destination as it will actually be run by Airbyte. Specifically you're not running it from within the docker container that will house it.
100+
101+
**Run using Docker**
102+
If you want to run your destination exactly as it will be run by Airbyte \(i.e. within a docker container\), you can use the following commands from the connector module directory \(`airbyte-integrations/connectors/destination-<name>`\):
103+
104+
```bash
105+
# First build the container
106+
docker build . -t airbyte/destination-<name>:dev
107+
108+
# Then use the following commands to run it
109+
docker run --rm airbyte/destination-<name>:dev spec
110+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/destination-<name>:dev check --config /secrets/config.json
111+
cat messages.jsonl | docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/sample_files:/sample_files airbyte/destination-<name>:dev read --config /secrets/config.json --catalog /sample_files/configured_catalog.json
112+
```
113+
114+
Note: Each time you make a change to your implementation you need to re-build the connector image. `docker build . -t airbyte/destination-<name>:dev`. This ensures the new python code is added into the docker container.
115+
116+
The nice thing about this approach is that you are running your source exactly as it will be run by Airbyte. The tradeoff is that iteration is slightly slower, because you need to re-build the connector between each change.
117+
118+
**TDD using standard tests**
119+
120+
_note: these tests aren't yet available for Python connectors but will be very soon. Until then you should use custom unit or integration tests for TDD_.
121+
122+
Airbyte provides a standard test suite that is run against every destination. The objective of these tests is to provide some "free" tests that can sanity check that the basic functionality of the destination works. One approach to developing your connector is to simply run the tests between each change and use the feedback from them to guide your development.
123+
124+
If you want to try out this approach, check out Step 6 which describes what you need to do to set up the standard tests for your destination.
125+
126+
The nice thing about this approach is that you are running your destination exactly as Airbyte will run it in the CI. The downside is that the tests do not run very quickly.
127+
128+
### Step 3: Implement `spec`
129+
130+
Each destination contains a specification written in JsonSchema that describes the inputs it requires and accepts. Defining the specification is a good place to start development.
131+
To do this, find the spec file generated in `airbyte-integrations/connectors/destination-<name>/src/main/resources/spec.json`. Edit it and you should be done with this step. The generated connector will take care of reading this file and converting it to the correct output.
132+
133+
Some notes about fields in the output spec:
134+
* `supportsNormalization` is a boolean which indicates if this connector supports [basic normalization via DBT](https://docs.airbyte.io/understanding-airbyte/basic-normalization). If true, `supportsDBT` must also be true.
135+
* `supportsDBT` is a boolean which indicates whether this destination is compatible with DBT. If set to true, the user can define custom DBT transformations that run on this destination after each successful sync. This must be true if `supportsNormalization` is set to true.
136+
* `supported_destination_sync_modes`: An array of strings declaring the sync modes supported by this connector. The available options are:
137+
* `overwrite`: The connector can be configured to wipe any existing data in a stream before writing new data
138+
* `append`: The connector can be configured to append new data to existing data
139+
* `append_dedupe`: The connector can be configured to deduplicate (i.e: UPSERT) data in the destination based on the new data and primary keys
140+
* `supportsIncremental`: Whether the connector supports any `append` sync mode. Must be set to true if `append` or `append_dedupe` are included in the `supported_destination_sync_modes`.
141+
142+
143+
Some helpful resources:
144+
145+
* [**JSONSchema website**](https://json-schema.org/)
146+
* [**Definition of Airbyte Protocol data models**](https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml). The output of `spec` is described by the `ConnectorSpecification` model (which is wrapped in an `AirbyteConnectionStatus` message).
147+
* [**Postgres Destination's spec.json file**](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-postgres/src/main/resources/spec.json) as an example `spec.json`.
148+
149+
Once you've edited the file, see the `spec` operation in action:
150+
151+
```bash
152+
python main.py spec
153+
```
154+
155+
### Step 4: Implement `check`
156+
157+
The check operation accepts a JSON object conforming to the `spec.json`. In other words if the `spec.json` said that the destination requires a `username` and `password`, the config object might be `{ "username": "airbyte", "password": "password123" }`. It returns a json object that reports, given the credentials in the config, whether we were able to connect to the destination.
158+
159+
While developing, we recommend storing any credentials in `secrets/config.json`. Any `secrets` directory in the Airbyte repo is gitignored by default.
160+
161+
Implement the `check` method in the generated file `destination_<name>/destination.py`. Here's an [example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/destination_kvdb/destination.py) from the KvDB destination.
162+
163+
Verify that the method is working by placing your config in `secrets/config.json` then running:
164+
165+
```bash
166+
python main.py check --config secrets/config.json
167+
```
168+
169+
### Step 5: Implement `write`
170+
The `write` operation is the main workhorse of a destination connector: it reads input data from the source and writes it to the underlying destination. It takes as input the config file used to run the connector as well as the configured catalog: the file used to describe the schema of the incoming data and how it should be written to the destination. Its "output" is two things:
171+
172+
1. Data written to the underlying destination
173+
2. `AirbyteMessage`s of type `AirbyteStateMessage`, written to stdout to indicate which records have been written so far during a sync. It's important to output these messages when possible in order to avoid re-extracting messages from the source. See the [write operation protocol reference](https://docs.airbyte.io/understanding-airbyte/airbyte-specification#write) for more information.
174+
175+
To implement the `write` Airbyte operation, implement the `write` method in your generated `destination.py` file. [Here is an example implementation](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/destination_kvdb/destination.py) from the KvDB destination connector.
176+
177+
### Step 6: Set up Acceptance Tests
178+
179+
_Coming soon. These tests are not yet available for Python destinations but will be very soon. For now please skip this step and rely on copious
180+
amounts of integration and unit testing_.
181+
182+
### Step 7: Write unit tests and/or integration tests
183+
The Acceptance Tests are meant to cover the basic functionality of a destination. Think of it as the bare minimum required for us to add a destination to Airbyte. You should probably add some unit testing or custom integration testing in case you need to test additional functionality of your destination.
184+
185+
Add unit tests in `unit_tests/` directory and integration tests in the `integration_tests/` directory. Run them via
186+
```bash
187+
python -m pytest -s -vv integration_tests/
188+
```
189+
190+
See the [KvDB integration tests](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-kvdb/integration_tests/integration_test.py) for an example of tests you can implement.
191+
192+
#### Step 8: Update the docs
193+
194+
Each connector has its own documentation page. By convention, that page should have the following path: in `docs/integrations/destinations/<destination-name>.md`. For the documentation to get packaged with the docs, make sure to add a link to it in `docs/SUMMARY.md`. You can pattern match doing that from existing connectors.
195+
196+
## Wrapping up
197+
Well done on making it this far! If you'd like your connector to ship with Airbyte by default, create a PR against the Airbyte repo and we'll work with you to get it across the finish line.

docs/contributing-to-airbyte/tutorials/building-a-python-source.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -106,14 +106,14 @@ Everyone develops differently but here are 3 ways that we recommend iterating on
106106

107107
**Run the source using python**
108108

109-
You'll notice in your source's directory that there is a python file called `main_dev.py`. This file exists as convenience for development. You can call it from within the virtual environment mentioned above `. ./.venv/bin/activate` to test out that your source works.
109+
You'll notice in your source's directory that there is a python file called `main.py`. This file exists as convenience for development. You can call it from within the virtual environment mentioned above `. ./.venv/bin/activate` to test out that your source works.
110110

111111
```text
112112
# from airbyte-integrations/connectors/source-<source-name>
113-
python main_dev.py spec
114-
python main_dev.py check --config secrets/config.json
115-
python main_dev.py discover --config secrets/config.json
116-
python main_dev.py read --config secrets/config.json --catalog sample_files/configured_catalog.json
113+
python main.py spec
114+
python main.py check --config secrets/config.json
115+
python main.py discover --config secrets/config.json
116+
python main.py read --config secrets/config.json --catalog sample_files/configured_catalog.json
117117
```
118118

119119
The nice thing about this approach is that you can iterate completely within in python. The downside is that you are not quite running your source as it will actually be run by Airbyte. Specifically you're not running it from within the docker container that will house it.

0 commit comments

Comments
 (0)