-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a walkthrough of building a custom Python connector #36743
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Spicy take: should this replace one of the existing tutorials? I don't like having to maintain three. |
@@ -58,7 +58,7 @@ and choose the relevant template by using the arrow keys. This will generate a n | |||
Search the generated directory for "TODO"s and follow them to implement your connector. For more detailed walkthroughs and instructions, follow the relevant tutorial: | |||
|
|||
- [Speedrun: Building a HTTP source with the CDK](tutorials/cdk-speedrun.md) | |||
- [Building a HTTP source with the CDK](tutorials/cdk-tutorial-python-http/getting-started.md) | |||
- [Building a HTTP source with the CDK](tutorials/custom-python-connector/0-getting-started.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the path because the cdk-tutorial-python-http is deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deleted in this PR? Make sure that you add redirects for all URLs that you delete — redirect to the new guide. That way we don't loose search traffic.
@@ -50,7 +50,7 @@ cd airbyte-integrations/connector-templates/generator | |||
Next, find all `TODO`s in the generated project directory. They're accompanied by comments explaining what you'll | |||
need to do in order to implement your connector. Upon completing all TODOs properly, you should have a functioning connector. | |||
|
|||
Additionally, you can follow [this tutorial](../tutorials/cdk-tutorial-python-http/getting-started.md) for a complete walkthrough of creating an HTTP connector using the Airbyte CDK. | |||
Additionally, you can follow [this tutorial](../tutorials/custom-python-connector/0-getting-started.md) for a complete walkthrough of creating an HTTP connector using the Airbyte CDK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the path because the cdk-tutorial-python-http is deleted
@@ -72,7 +72,7 @@ Airbyte recommends using the CDK template generator to develop with the CDK. The | |||
|
|||
For tips on useful Python knowledge, see the [Python Concepts](python-concepts.md) page. | |||
|
|||
You can find a complete tutorial for implementing an HTTP source connector in [this tutorial](../tutorials/cdk-tutorial-python-http/getting-started.md) | |||
You can find a complete tutorial for implementing an HTTP source connector in [this tutorial](../tutorials/custom-python-connector/0-getting-started.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the path because the cdk-tutorial-python-http is deleted
@@ -26,7 +26,7 @@ See the [catalog guide](https://docs.airbyte.com/understanding-airbyte/beginners | |||
|
|||
Let's define the stream schema in `source-exchange-rates-tutorial/source_exchange_rates_tutorial/schemas/rates.json` | |||
|
|||
You can download the JSON file describing the output schema with all currencies [here](../../tutorials/cdk-tutorial-python-http/exchange_rates_schema.json) for convenience and place it in `schemas/`. | |||
You can download the JSON file describing the output schema with all currencies [here](./exchange_rates_schema.json) for convenience and place it in `schemas/`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update the path because the cdk-tutorial-python-http is deleted
@@ -35,7 +35,7 @@ airbyte-ci connectors --use-remote-secrets=false --name source-exchange-rates-tu | |||
|
|||
## Next steps: | |||
|
|||
Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/connector-development/tutorials/cdk-tutorial-python-http/use-connector-in-airbyte). | |||
Next, we'll add the connector to the [Airbyte platform](https://docs.airbyte.com/operator-guides/using-custom-connectors). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized there's a better guide to using custom connectors
``` | ||
rm unit_tests/test_incremental_streams.py unit_tests/test_source.py unit_tests/test_streams.py | ||
``` | ||
Replace the content of `airbyte-integrations/connectors/source-survey-monkey-demo/source_survey_monkey_demo/source.py` with the following template: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not a fan of the code in the template. We can do a better job of encouraging composition instead of inheritance
|
||
We'll do this by trying to read a single record from the stream, and fail the connector could not read any. | ||
```python | ||
def check_connection(self, logger, config) -> Tuple[bool, any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the template should provide a sensible implementation by default instead of returning True, None
. it's pretty useless
|
||
The `get_updated_state` method is used to update the stream's state. We'll set its value to the maximum between the current state's value and the value extracted from the record. | ||
```python | ||
def get_updated_state(self, current_stream_state: MutableMapping[str, Any], latest_record: Mapping[str, Any]) -> Mapping[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this won't be part of the best practices anymore since RFR won't work with connectors that use get_updated_state
|
||
Let's update the source. The bulk of the change is changing its parent class to `ConcurrentSourceAdapter`, and updating its `__init__` method so it's properly initialized. This requires a little bit of boilerplate: | ||
```python | ||
class SourceSurveyMonkeyDemo(ConcurrentSourceAdapter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the need for the adapter to be pretty sad. I think we're pretty close to being able to recommend using the concurrent cdk interfaces from the get go. We're mostly only missing the concept of a PaginatedRequester
.
yield parent_record | ||
``` | ||
|
||
This can be solved by implementing the connector using constructs from the concurrent CDK directly instead of wrapping synchronous streams in an adapter. This is left outside of the scope of this tutorial because no production connectors currently implement this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very unfortunate broken window. We never got to productionizing support for concurrent substreams.
Build is failing on vercel but passing locally. not sure what the issue is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this is great — this will be an epic onboarding guide for Cristina!
A few things to work on:
- Formatting
- Make sure links we delete redirect to this guide
- Make sure to delete speedrun as well
- Make issues for further improvements
I will sit down with this later today to get a few changes in and debug why Vercel did not work.
@@ -58,7 +58,7 @@ and choose the relevant template by using the arrow keys. This will generate a n | |||
Search the generated directory for "TODO"s and follow them to implement your connector. For more detailed walkthroughs and instructions, follow the relevant tutorial: | |||
|
|||
- [Speedrun: Building a HTTP source with the CDK](tutorials/cdk-speedrun.md) | |||
- [Building a HTTP source with the CDK](tutorials/cdk-tutorial-python-http/getting-started.md) | |||
- [Building a HTTP source with the CDK](tutorials/custom-python-connector/0-getting-started.md) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deleted in this PR? Make sure that you add redirects for all URLs that you delete — redirect to the new guide. That way we don't loose search traffic.
@@ -3,7 +3,7 @@ | |||
## CDK Speedrun \(HTTP API Source Creation Any Route\) | |||
|
|||
This is a blazing fast guide to building an HTTP source connector. Think of it as the TL;DR version | |||
of [this tutorial.](cdk-tutorial-python-http/getting-started.md) | |||
of [this tutorial.](custom-python-connector/0-getting-started.md) | |||
|
|||
If you are a visual learner and want to see a video version of this guide going over each part in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should urge people to use Connector Builder and low-code for most connectors.
I would say, let's delete this guide as well and point it to the new one. Perhaps we can add a little note on the top of it to direct some users to lowcode?
# Getting started | ||
This tutorial will walk you through the creation of a custom Airbyte connector implemented with the Python CDK. This tutorial assumes you're already familiar with Airbyte concept and you've already built a connector using the [Connector Builder](../../connector-builder-ui/tutorial.mdx). | ||
|
||
The Python CDK should be used to implement connectors that require features that are not yet available in the Connector Builder or in the low-code framework. You can use the [Connector Builder compatibility guide](../../connector-builder-ui/connector-builder-compatibility.md) to know whether it is suitable for your needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3
docs/connector-development/tutorials/custom-python-connector/0-getting-started.md
Outdated
Show resolved
Hide resolved
The two approaches are equivalent for the Survey Monkey API, but as a rule of thumb, it is preferable to use the links provided by the API if it is available instead of reverse engineering the mechanism. This way, we don't need to modify the connector if the API changes their pagination mechanism, for instance, if they decide to implement server-side pagination. | ||
|
||
:::info | ||
When available, server-side pagination should be preferred over client-side pagination because it has lower risks of missing records if the collection is modified while the connector iterates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super clear to junior devs what server-side vs client-side pagination is. You're talking about it enough. Server-side — links provided by API, client-side: manually changing page_number or offset.
docs/connector-development/tutorials/custom-python-connector/2-reading-a-page.md
Outdated
Show resolved
Hide resolved
docs/connector-development/tutorials/custom-python-connector/4-check-and-error-handling.md
Outdated
Show resolved
Hide resolved
docs/connector-development/tutorials/custom-python-connector/4-check-and-error-handling.md
Outdated
Show resolved
Hide resolved
docs/connector-development/tutorials/custom-python-connector/8-concurrency.md
Outdated
Show resolved
Hide resolved
docs/connector-development/tutorials/custom-python-connector/8-concurrency.md
Outdated
Show resolved
Hide resolved
…nto alex/python_tutorial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, I'll look into redirects and merge tonight
What
Write a Python source connector development walkthrough covering
This can replace two old tutorials