Skip to content

Commit 5810646

Browse files
girardasherifnadaalafanecherebrianjlai
authored andcommitted
Tutorial and documentation for config-based connectors (#15027)
* 5-step tutorial * move * tiny bit of editing * Update tutorial * update docs * reset * move files * record selector, request options, and more links * update * update * connector definition * link * links * update example * footnote * typo * document string interpolation * note on string interpolation * update * fix code sample * fix * update sample * fix * use the actual config * Update as per comments * write as yaml * typo * Clarify options overloading * clarify that docker must be running * remove extra footnote * use venv directly * Apply suggestions from code review Co-authored-by: Sherif A. Nada <[email protected]> * signup instructions * update * clarify that both dot and bracket notations are interchangeable * Clarify how check works * create spec and config before updating connector definition * clarify what now_local() is * rename to yaml structure * Go through tutorial and update end of section code samples * fix link * update * update code samples * Update code samples * Update to bracket notation * remove superfluous comments * Update docs/connector-development/config-based/tutorial/2-install-dependencies.md Co-authored-by: Augustin <[email protected]> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <[email protected]> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <[email protected]> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <[email protected]> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <[email protected]> * Update docs/connector-development/config-based/tutorial/3-connecting-to-the-API-source.md Co-authored-by: Augustin <[email protected]> * Update docs/connector-development/config-based/tutorial/4-reading-data.md Co-authored-by: Augustin <[email protected]> * fix path * update * motivation blurp * warning * warning * fix code block * update code samples * update code sample * update code samples * small updates * update yaml structure * custom class example * language annotations * update warning * Update tutorial to use dpath extractor * Update record selector docs * unit test * link to contributing * tiny update * $ in front of commands * $ in front of commands * More readings * link to existing config-based connectors * index * update * delete broken link * supported features * update * Add some links * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <[email protected]> * Update docs/connector-development/config-based/record-selector.md Co-authored-by: Brian Lai <[email protected]> * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <[email protected]> * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <[email protected]> * Update docs/connector-development/config-based/overview.md Co-authored-by: Brian Lai <[email protected]> * mention the unit * headers * remove mentions of interpolating on stream slice, etc. * update * exclude config-based docs Co-authored-by: Sherif A. Nada <[email protected]> Co-authored-by: Augustin <[email protected]> Co-authored-by: Brian Lai <[email protected]>
1 parent 2056e35 commit 5810646

22 files changed

+2123
-4
lines changed

airbyte-cdk/python/airbyte_cdk/sources/declarative/auth/token.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ def token(self) -> str:
8383
@dataclass
8484
class BasicHttpAuthenticator(AbstractHeaderAuthenticator):
8585
"""
86-
Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using bas64
86+
Builds auth based off the basic authentication scheme as defined by RFC 7617, which transmits credentials as USER ID/password pairs, encoded using base64
8787
https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme
8888
8989
The header is of the form

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/class_types_registry.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from typing import Mapping, Type
66

7+
from airbyte_cdk.sources.declarative.auth.oauth import DeclarativeOauth2Authenticator
78
from airbyte_cdk.sources.declarative.auth.token import ApiKeyAuthenticator, BasicHttpAuthenticator, BearerAuthenticator
89
from airbyte_cdk.sources.declarative.datetime.min_max_datetime import MinMaxDatetime
910
from airbyte_cdk.sources.declarative.declarative_stream import DeclarativeStream
@@ -56,6 +57,7 @@
5657
"ListStreamSlicer": ListStreamSlicer,
5758
"MinMaxDatetime": MinMaxDatetime,
5859
"NoPagination": NoPagination,
60+
"OAuthAuthenticator": DeclarativeOauth2Authenticator,
5961
"OffsetIncrement": OffsetIncrement,
6062
"RecordSelector": RecordSelector,
6163
"RemoveFields": RemoveFields,

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/factory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ class DeclarativeComponentFactory:
5252
If the component definition is a mapping with neither a "class_name" nor a "type" field,
5353
the factory will do a best-effort attempt at inferring the component type by looking up the parent object's constructor type hints.
5454
If the type hint is an interface present in `DEFAULT_IMPLEMENTATIONS_REGISTRY`,
55-
then the factory will create an object of it's default implementation.
55+
then the factory will create an object of its default implementation.
5656
5757
If the component definition is a list, then the factory will iterate over the elements of the list,
5858
instantiate its subcomponents, and return a list of instantiated objects.

airbyte-cdk/python/airbyte_cdk/sources/declarative/parsers/yaml_parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ class YamlParser(ConnectionDefinitionParser):
1515
"""
1616
Parses a Yaml string to a ConnectionDefinition
1717
18-
In addition to standard Yaml parsing, the input_string can contain refererences to values previously defined.
18+
In addition to standard Yaml parsing, the input_string can contain references to values previously defined.
1919
This parser will dereference these values to produce a complete ConnectionDefinition.
2020
2121
References can be defined using a *ref(<arg>) string.

airbyte-cdk/python/unit_tests/sources/declarative/extractors/test_dpath_extractor.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
[
2121
("test_extract_from_array", ["data"], {"data": [{"id": 1}, {"id": 2}]}, [{"id": 1}, {"id": 2}]),
2222
("test_extract_single_record", ["data"], {"data": {"id": 1}}, [{"id": 1}]),
23+
("test_extract_single_record_from_root", [], {"id": 1}, [{"id": 1}]),
2324
("test_extract_from_root_array", [], [{"id": 1}, {"id": 2}], [{"id": 1}, {"id": 2}]),
2425
("test_nested_field", ["data", "records"], {"data": {"records": [{"id": 1}, {"id": 2}]}}, [{"id": 1}, {"id": 2}]),
2526
("test_field_in_config", ["{{ config['field'] }}"], {"record_array": [{"id": 1}, {"id": 2}]}, [{"id": 1}, {"id": 2}]),
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Authentication
2+
3+
The `Authenticator` defines how to configure outgoing HTTP requests to authenticate on the API source.
4+
5+
## Authenticators
6+
7+
### ApiKeyAuthenticator
8+
9+
The `ApiKeyAuthenticator` sets an HTTP header on outgoing requests.
10+
The following definition will set the header "Authorization" with a value "Bearer hello":
11+
12+
```yaml
13+
authenticator:
14+
type: "ApiKeyAuthenticator"
15+
header: "Authorization"
16+
token: "Bearer hello"
17+
```
18+
19+
### BearerAuthenticator
20+
21+
The `BearerAuthenticator` is a specialized `ApiKeyAuthenticator` that always sets the header "Authorization" with the value "Bearer {token}".
22+
The following definition will set the header "Authorization" with a value "Bearer hello"
23+
24+
```yaml
25+
authenticator:
26+
type: "BearerAuthenticator"
27+
token: "hello"
28+
```
29+
30+
More information on bearer authentication can be found [here](https://swagger.io/docs/specification/authentication/bearer-authentication/)
31+
32+
### BasicHttpAuthenticator
33+
34+
The `BasicHttpAuthenticator` set the "Authorization" header with a (USER ID/password) pair, encoded using base64 as per [RFC 7617](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#basic_authentication_scheme).
35+
The following definition will set the header "Authorization" with a value "Basic {encoded credentials}"
36+
37+
```yaml
38+
authenticator:
39+
type: "BasicHttpAuthenticator"
40+
username: "hello"
41+
password: "world"
42+
```
43+
44+
The password is optional. Authenticating with APIs using Basic HTTP and a single API key can be done as:
45+
46+
```yaml
47+
authenticator:
48+
type: "BasicHttpAuthenticator"
49+
username: "hello"
50+
```
51+
52+
### OAuth
53+
54+
OAuth authentication is supported through the `OAuthAuthenticator`, which requires the following parameters:
55+
56+
- token_refresh_endpoint: The endpoint to refresh the access token
57+
- client_id: The client id
58+
- client_secret: The client secret
59+
- refresh_token: The token used to refresh the access token
60+
- scopes (Optional): The scopes to request. Default: Empty list
61+
- token_expiry_date (Optional): The access token expiration date formatted as RFC-3339 ("%Y-%m-%dT%H:%M:%S.%f%z")
62+
- access_token_name (Optional): The field to extract access token from in the response. Default: "access_token".
63+
- expires_in_name (Optional): The field to extract expires_in from in the response. Default: "expires_in"
64+
- refresh_request_body (Optional): The request body to send in the refresh request. Default: None
65+
66+
```yaml
67+
authenticator:
68+
type: "OAuthAuthenticator"
69+
token_refresh_endpoint: "https://api.searchmetrics.com/v4/token"
70+
client_id: "{{ config['api_key'] }}"
71+
client_secret: "{{ config['client_secret'] }}"
72+
refresh_token: ""
73+
```
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Error handling
2+
3+
By default, only server errors (HTTP 5XX) and too many requests (HTTP 429) will be retried up to 5 times with exponential backoff.
4+
Other HTTP errors will result in a failed read.
5+
6+
Other behaviors can be configured through the `Requester`'s `error_handler` field.
7+
8+
## Defining errors
9+
10+
### From status code
11+
12+
Response filters can be used to define how to handle requests resulting in responses with a specific HTTP status code.
13+
For instance, this example will configure the handler to also retry responses with 404 error:
14+
15+
```yaml
16+
requester:
17+
<...>
18+
error_handler:
19+
response_filters:
20+
- http_codes: [ 404 ]
21+
action: RETRY
22+
```
23+
24+
Response filters can be used to specify HTTP errors to ignore.
25+
For instance, this example will configure the handler to ignore responses with 404 error:
26+
27+
```yaml
28+
requester:
29+
<...>
30+
error_handler:
31+
response_filters:
32+
- http_codes: [ 404 ]
33+
action: IGNORE
34+
```
35+
36+
### From error message
37+
38+
Errors can also be defined by parsing the error message.
39+
For instance, this error handler will ignores responses if the error message contains the string "ignorethisresponse"
40+
41+
```yaml
42+
requester:
43+
<...>
44+
error_handler:
45+
response_filters:
46+
- error_message_contain: "ignorethisresponse"
47+
action: IGNORE
48+
```
49+
50+
This can also be done through a more generic string interpolation strategy with the following parameters:
51+
52+
- response: the decoded response
53+
54+
This example ignores errors where the response contains a "code" field:
55+
56+
```yaml
57+
requester:
58+
<...>
59+
error_handler:
60+
response_filters:
61+
- predicate: "{{ 'code' in response }}"
62+
action: IGNORE
63+
```
64+
65+
The error handler can have multiple response filters.
66+
The following example is configured to ignore 404 errors, and retry 429 errors:
67+
68+
```yaml
69+
requester:
70+
<...>
71+
error_handler:
72+
response_filters:
73+
- http_codes: [ 404 ]
74+
action: IGNORE
75+
- http_codes: [ 429 ]
76+
action: RETRY
77+
```
78+
79+
## Backoff Strategies
80+
81+
The error handler supports a few backoff strategies, which are described in the following sections.
82+
83+
### Exponential backoff
84+
85+
This is the default backoff strategy. The requester will backoff with an exponential backoff interval
86+
87+
### Constant Backoff
88+
89+
When using the `ConstantBackoffStrategy`, the requester will backoff with a constant interval.
90+
91+
### Wait time defined in header
92+
93+
When using the `WaitTimeFromHeaderBackoffStrategy`, the requester will backoff by an interval specified in the response header.
94+
In this example, the requester will backoff by the response's "wait_time" header value:
95+
96+
```yaml
97+
requester:
98+
<...>
99+
error_handler:
100+
<...>
101+
backoff_strategies:
102+
- type: "WaitTimeFromHeaderBackoffStrategy"
103+
header: "wait_time"
104+
```
105+
106+
Optionally, a regular expression can be configured to extract the wait time from the header value.
107+
108+
```yaml
109+
requester:
110+
<...>
111+
error_handler:
112+
<...>
113+
backoff_strategies:
114+
- type: "WaitTimeFromHeaderBackoffStrategy"
115+
header: "wait_time"
116+
regex: "[-+]?\d+"
117+
```
118+
119+
### Wait until time defined in header
120+
121+
When using the `WaitUntilTimeFromHeaderBackoffStrategy`, the requester will backoff until the time specified in the response header.
122+
In this example, the requester will wait until the time specified in the "wait_until" header value:
123+
124+
```yaml
125+
requester:
126+
<...>
127+
error_handler:
128+
<...>
129+
backoff_strategies:
130+
- type: "WaitUntilTimeFromHeaderBackoffStrategy"
131+
header: "wait_until"
132+
regex: "[-+]?\d+"
133+
min_wait: 5
134+
```
135+
136+
The strategy accepts an optional regular expression to extract the time from the header value, and a minimum time to wait.
137+
138+
## Advanced error handling
139+
140+
The error handler can have multiple backoff strategies, allowing it to fallback if a strategy cannot be evaluated.
141+
For instance, the following defines an error handler that will read the backoff time from a header, and default to a constant backoff if the wait time could not be extracted from the response:
142+
143+
```yaml
144+
requester:
145+
<...>
146+
error_handler:
147+
<...>
148+
backoff_strategies:
149+
- type: "WaitTimeFromHeaderBackoffStrategy"
150+
header: "wait_time"
151+
- type: "ConstantBackoffStrategy"
152+
backoff_time_in_seconds: 5
153+
154+
```
155+
156+
The `requester` can be configured to use a `CompositeErrorHandler`, which sequentially iterates over a list of error handlers, enabling different retry mechanisms for different types of errors.
157+
158+
In this example, a constant backoff of 5 seconds, will be applied if the response contains a "code" field, and an exponential backoff will be applied if the error code is 403:
159+
160+
```yaml
161+
requester:
162+
<...>
163+
error_handler:
164+
type: "CompositeErrorHandler"
165+
error_handlers:
166+
- response_filters:
167+
- predicate: "{{ 'code' in response }}"
168+
action: RETRY
169+
backoff_strategies:
170+
- type: "ConstantBackoffStrategy"
171+
backoff_time_in_seconds: 5
172+
- response_filters:
173+
- http_codes: [ 403 ]
174+
action: RETRY
175+
backoff_strategies:
176+
- type: "ExponentialBackoffStrategy"
177+
```
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Index
2+
3+
## From scratch
4+
5+
- [Overview](overview.md)
6+
- [Yaml structure](overview.md)
7+
- [Reference docs](https://airbyte-cdk.readthedocs.io/en/latest/api/airbyte_cdk.sources.declarative.html)
8+
9+
## Concepts
10+
11+
- [Authentication](authentication.md)
12+
- [Error handling](error-handling.md)
13+
- [Pagination](pagination.md)
14+
- [Record selection](record-selector.md)
15+
- [Request options](request-options.md)
16+
- [Stream slicers](stream-slicers.md)
17+
18+
## Tutorial
19+
20+
0. [Getting started](tutorial/0-getting-started.md)
21+
1. [Creating a source](tutorial/1-create-source.md)
22+
2. [Installing dependencies](tutorial/2-install-dependencies.md)
23+
3. [Connecting to the API](tutorial/3-connecting-to-the-API-source.md)
24+
4. [Reading data](tutorial/4-reading-data.md)
25+
5. [Incremental reads](tutorial/5-incremental-reads.md)
26+
6. [Testing](tutorial/6-testing.md)

0 commit comments

Comments
 (0)