Skip to content

Commit 57dffd6

Browse files
[docs] update CDK Tutorial: Python HTTP (#22069)
* [docs] update CDK Tutorial: Python HTTP * Update airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/6-read-data.md Co-authored-by: Sergio Ropero <[email protected]> * Code review --------- Co-authored-by: Sergio Ropero <[email protected]>
1 parent c2dcb0e commit 57dffd6

File tree

11 files changed

+93
-55
lines changed

11 files changed

+93
-55
lines changed

airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/0-getting-started.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ This is a step-by-step guide for how to create an Airbyte source in Python to re
1212

1313
All the commands below assume that `python` points to a version of python &gt;=3.9.0. On some systems, `python` points to a Python2 installation and `python3` points to Python3. If this is the case on your machine, substitute all `python` commands in this guide with `python3`.
1414

15+
## Exchange Rates API Setup
16+
17+
For this guide we will be making API calls to the Exchange Rates API. In order to generate the API access key that will be used by the new connector, you will have to follow steps on the [Exchange Rates API](https://exchangeratesapi.io/) by signing up for the Free tier plan. Once you have an API access key, you can continue with the guide.
18+
1519
## Checklist
1620

1721
* Step 1: Create the source using the template

airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/3-define-inputs.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Each connector declares the inputs it needs to read data from the underlying data source. This is the Airbyte Protocol's `spec` operation.
44

5-
The simplest way to implement this is by creating a `.json` file in `source_<name>/spec.json` which describes your connector's inputs according to the [ConnectorSpecification](https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml#L211) schema. This is a good place to start when developing your source. Using JsonSchema, define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-freshdesk/source_freshdesk/spec.json) of what the `spec.json` looks like for the Freshdesk API source.
5+
The simplest way to implement this is by creating a `spec.json` file in `source_<name>/spec.json` which describes your connector's inputs according to the [ConnectorSpecification](https://github.com/airbytehq/airbyte/blob/master/airbyte-protocol/models/src/main/resources/airbyte_protocol/airbyte_protocol.yaml#L211) schema. This is a good place to start when developing your source. Using JsonSchema, define what the inputs are \(e.g. username and password\). Here's [an example](https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/source-freshdesk/source_freshdesk/spec.json) of what the `spec.json` looks like for the Freshdesk API source.
66

77
For more details on what the spec is, you can read about the Airbyte Protocol [here](https://docs.airbyte.io/understanding-airbyte/airbyte-protocol).
88

@@ -17,8 +17,13 @@ Given that we'll pulling currency data for our example source, we'll define the
1717
"$schema": "http://json-schema.org/draft-07/schema#",
1818
"title": "Python Http Tutorial Spec",
1919
"type": "object",
20-
"required": ["start_date", "currency_base"],
20+
"required": ["apikey", "start_date", "base"],
2121
"properties": {
22+
"apikey": {
23+
"type": "string",
24+
"description": "API access key used to retrieve data from the Exchange Rates API.",
25+
"airbyte_secret": true
26+
}
2227
"start_date": {
2328
"type": "string",
2429
"description": "Start getting data from that date.",
@@ -27,7 +32,7 @@ Given that we'll pulling currency data for our example source, we'll define the
2732
},
2833
"base": {
2934
"type": "string",
30-
"examples": ["USD", "EUR"]
35+
"examples": ["USD", "EUR"],
3136
"description": "ISO reference currency. See <a href=\"https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html\">here</a>."
3237
}
3338
}
@@ -40,8 +45,9 @@ Beside regular parameter there is intenal CDK config that started with '_' chara
4045
* _page_size - for http based streams set number of records for each page. Depends on stream implementation.
4146

4247

43-
In addition to metadata, we define two inputs:
48+
In addition to metadata, we define three inputs:
4449

50+
* `apikey`: The API access key used to authenticate requests to the API
4551
* `start_date`: The beginning date to start tracking currency exchange rates from
4652
* `base`: The currency whose rates we're interested in tracking
4753

airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/4-connection-checking.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ The second operation in the Airbyte Protocol that we'll implement is the `check`
44

55
This operation verifies that the input configuration supplied by the user can be used to connect to the underlying data source. Note that this user-supplied configuration has the values described in the `spec.json` filled in. In other words if the `spec.json` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. You should then implement something that returns a json object reporting, given the credentials in the config, whether we were able to connect to the source.
66

7+
In order to make requests to the API, we need to specify the access.
78
In our case, this is a fairly trivial check since the API requires no credentials. Instead, let's verify that the user-input `base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated source:
89

910
```python
@@ -37,11 +38,11 @@ Following the docstring instructions, we'll change the implementation to verify
3738
return True, None
3839
```
3940

40-
Let's test out this implementation by creating two objects: a valid and an invalid config and attempt to give them as input to the connector
41+
Let's test out this implementation by creating two objects: a valid and an invalid config and attempt to give them as input to the connector. For this section, you will need to take the API access key generated earlier and add it to both configs. Because these configs contain secrets, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default. For the purpose of this example, a dummy apikey has been setup in `sample_files/config.json`.
4142

4243
```text
43-
echo '{"start_date": "2021-04-01", "base": "USD"}' > sample_files/config.json
44-
echo '{"start_date": "2021-04-01", "base": "BTC"}' > sample_files/invalid_config.json
44+
echo '{"start_date": "2021-04-01", "base": "USD", "apikey": <your_apikey>}' > sample_files/config.json
45+
echo '{"start_date": "2021-04-01", "base": "BTC", "apikey": <your_apikey>}' > sample_files/invalid_config.json
4546
python main.py check --config sample_files/config.json
4647
python main.py check --config sample_files/invalid_config.json
4748
```

airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/5-declare-schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ We'll begin by creating a stream to represent the data that we're pulling from t
1010

1111
```python
1212
class ExchangeRates(HttpStream):
13-
url_base = "https://api.exchangeratesapi.io/"
13+
url_base = "https://api.apilayer.com/exchangerates_data/"
1414

1515
# Set this as a noop.
1616
primary_key = None

airbyte-cdk/python/docs/tutorials/cdk-tutorial-python-http/6-read-data.md

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,14 @@ Let's begin by pulling data for the last day's rates by using the `/latest` endp
3636

3737
```python
3838
class ExchangeRates(HttpStream):
39-
url_base = "https://api.exchangeratesapi.io/"
39+
url_base = "https://api.apilayer.com/exchangerates_data/"
4040

4141
primary_key = None
4242

43-
def __init__(self, base: str, **kwargs):
43+
def __init__(self, config: Mapping[str, Any], **kwargs):
4444
super().__init__()
45-
self.base = base
45+
self.base = config['base']
46+
self.apikey = config['apikey']
4647

4748

4849
def path(
@@ -54,6 +55,12 @@ class ExchangeRates(HttpStream):
5455
# The "/latest" path gives us the latest currency exchange rates
5556
return "latest"
5657

58+
def request_headers(
59+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
60+
) -> Mapping[str, Any]:
61+
# The api requires that we include apikey as a header so we do that in this method
62+
return {'apikey': self.apikey}
63+
5764
def request_params(
5865
self,
5966
stream_state: Mapping[str, Any],
@@ -80,14 +87,20 @@ class ExchangeRates(HttpStream):
8087
return None
8188
```
8289

83-
This may look big, but that's just because there are lots of \(unused, for now\) parameters in these methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really we just added a few lines of "significant" code: 1. Added a constructor `__init__` which stores the `base` currency to query for. 2. `return {'base': self.base}` to add the `?base=<base-value>` query parameter to the request based on the `base` input by the user. 3. `return [response.json()]` to parse the response from the API to match the schema of our schema `.json` file. 4. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the latest exchange rate data.
90+
This may look big, but that's just because there are lots of \(unused, for now\) parameters in these methods \(those can be hidden with Python's `**kwargs`, but don't worry about it for now\). Really we just added a few lines of "significant" code:
8491

85-
Let's also pass the `base` parameter input by the user to the stream class:
92+
1. Added a constructor `__init__` which stores the `base` currency to query for and the `apikey` used for authentication.
93+
2. `return {'base': self.base}` to add the `?base=<base-value>` query parameter to the request based on the `base` input by the user.
94+
3. `return {'apikey': self.apikey}` to add the header `apikey=<apikey-string>` to the request based on the `apikey` input by the user.
95+
4. `return [response.json()]` to parse the response from the API to match the schema of our schema `.json` file.
96+
5. `return "latest"` to indicate that we want to hit the `/latest` endpoint of the API to get the latest exchange rate data.
97+
98+
Let's also pass the config specified by the user to the stream class:
8699

87100
```python
88101
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
89102
auth = NoAuth()
90-
return [ExchangeRates(authenticator=auth, base=config['base'])]
103+
return [ExchangeRates(authenticator=auth, config=config)]
91104
```
92105

93106
We're now ready to query the API!
@@ -127,20 +140,21 @@ Let's get the easy parts out of the way and pass the `start_date`:
127140

128141
```python
129142
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
130-
auth = NoAuth()
131-
# Parse the date from a string into a datetime object
132-
start_date = datetime.strptime(config['start_date'], '%Y-%m-%d')
133-
return [ExchangeRates(authenticator=auth, base=config['base'], start_date=start_date)]
143+
auth = NoAuth()
144+
# Parse the date from a string into a datetime object
145+
start_date = datetime.strptime(config['start_date'], '%Y-%m-%d')
146+
return [ExchangeRates(authenticator=auth, config=config, start_date=start_date)]
134147
```
135148

136149
Let's also add this parameter to the constructor and declare the `cursor_field`:
137150

138151
```python
139152
from datetime import datetime, timedelta
153+
from airbyte_cdk.sources.streams import IncrementalMixin
140154

141155

142156
class ExchangeRates(HttpStream, IncrementalMixin):
143-
url_base = "https://api.exchangeratesapi.io/"
157+
url_base = "https://api.apilayer.com/exchangerates_data/"
144158
cursor_field = "date"
145159
primary_key = "date"
146160

@@ -176,7 +190,7 @@ Update internal state `cursor_value` inside `read_records` method
176190
def read_records(self, *args, **kwargs) -> Iterable[Mapping[str, Any]]:
177191
for record in super().read_records(*args, **kwargs):
178192
if self._cursor_value:
179-
latest_record_date = datetime.strptime(latest_record[self.cursor_field], '%Y-%m-%d')
193+
latest_record_date = datetime.strptime(record[self.cursor_field], '%Y-%m-%d')
180194
self._cursor_value = max(self._cursor_value, latest_record_date)
181195
yield record
182196

airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/source.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515

1616
class ExchangeRates(HttpStream):
17-
url_base = "http://api.exchangeratesapi.io/"
17+
url_base = "https://api.apilayer.com/exchangerates_data/"
1818
cursor_field = "date"
1919
primary_key = "date"
2020

@@ -34,14 +34,20 @@ def path(
3434
) -> str:
3535
return stream_slice["date"]
3636

37+
def request_headers(
38+
self, stream_state: Mapping[str, Any], stream_slice: Mapping[str, Any] = None, next_page_token: Mapping[str, Any] = None
39+
) -> Mapping[str, Any]:
40+
# The api requires that we include apikey as a header so we do that in this method
41+
return {'apikey': self.apikey}
42+
3743
def request_params(
3844
self,
3945
stream_state: Mapping[str, Any],
4046
stream_slice: Mapping[str, Any] = None,
4147
next_page_token: Mapping[str, Any] = None,
4248
) -> MutableMapping[str, Any]:
43-
# The api requires that we include access_key as a query param so we do that in this method
44-
return {"access_key": self.access_key}
49+
# The api requires that we include the base currency as a query param so we do that in this method
50+
return {'base': self.base}
4551

4652
def parse_response(
4753
self,

airbyte-integrations/connectors/source-python-http-tutorial/source_python_http_tutorial/spec.json

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,25 @@
11
{
2-
"documentationUrl": "https://docs.airbyte.com/integrations/sources/exchangeratesapi",
2+
"documentationUrl": "https://docs.airbyte.io/integrations/sources/exchangeratesapi",
33
"connectionSpecification": {
44
"$schema": "http://json-schema.org/draft-07/schema#",
55
"title": "Python Http Tutorial Spec",
66
"type": "object",
7-
"required": ["start_date", "base"],
8-
"additionalProperties": false,
7+
"required": ["apikey", "start_date", "base"],
98
"properties": {
10-
"access_key": {
11-
"title": "Access Key",
9+
"apikey": {
1210
"type": "string",
13-
"description": "API access key used to retrieve data from the Exchange Rates API."
14-
},
11+
"description": "API access key used to retrieve data from the Exchange Rates API.",
12+
"airbyte_secret": true
13+
}
1514
"start_date": {
16-
"title": "Start Date",
1715
"type": "string",
18-
"description": "UTC date and time in the format 2017-01-25. Any data before this date will not be replicated.",
16+
"description": "Start getting data from that date.",
1917
"pattern": "^[0-9]{4}-[0-9]{2}-[0-9]{2}$",
20-
"examples": ["YYYY-MM-DD"]
18+
"examples": ["%Y-%m-%d"]
2119
},
2220
"base": {
23-
"title": "Currency",
2421
"type": "string",
22+
"examples": ["USD", "EUR"],
2523
"description": "ISO reference currency. See <a href=\"https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html\">here</a>."
2624
}
2725
}

docs/connector-development/tutorials/cdk-tutorial-python-http/connection-checking.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The second operation in the Airbyte Protocol that we'll implement is the `check`
44

55
This operation verifies that the input configuration supplied by the user can be used to connect to the underlying data source. Note that this user-supplied configuration has the values described in the `spec.yaml` filled in. In other words if the `spec.yaml` said that the source requires a `username` and `password` the config object might be `{ "username": "airbyte", "password": "password123" }`. You should then implement something that returns a json object reporting, given the credentials in the config, whether we were able to connect to the source.
66

7-
In order to make requests to the API, we need to specify the access.
7+
In order to make requests to the API, we need to specify the access.
88
In our case, this is a fairly trivial check since the API requires no credentials. Instead, let's verify that the user-input `base` currency is a legitimate currency. In `source.py` we'll find the following autogenerated source:
99

1010
```python
@@ -44,8 +44,8 @@ Let's test out this implementation by creating two objects: a valid and an inval
4444

4545
```text
4646
mkdir sample_files
47-
echo '{"start_date": "2022-04-01", "base": "USD", "access_key": <your_access_key>}' > secrets/config.json
48-
echo '{"start_date": "2022-04-01", "base": "BTC", "access_key": <your_access_key>}' > secrets/invalid_config.json
47+
echo '{"start_date": "2022-04-01", "base": "USD", "apikey": <your_apikey>}' > secrets/config.json
48+
echo '{"start_date": "2022-04-01", "base": "BTC", "apikey": <your_apikey>}' > secrets/invalid_config.json
4949
python main.py check --config secrets/config.json
5050
python main.py check --config secrets/invalid_config.json
5151
```
@@ -59,3 +59,5 @@ You should see output like the following:
5959
> python main.py check --config secrets/invalid_config.json
6060
{"type": "CONNECTION_STATUS", "connectionStatus": {"status": "FAILED", "message": "Input currency BTC is invalid. Please input one of the following currencies: {'DKK', 'USD', 'CZK', 'BGN', 'JPY'}"}}
6161
```
62+
63+
While developing, we recommend storing configs which contain secrets in `secrets/config.json` because the `secrets` directory is gitignored by default.

docs/connector-development/tutorials/cdk-tutorial-python-http/declare-schema.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ We'll begin by creating a stream to represent the data that we're pulling from t
1313

1414
```python
1515
class ExchangeRates(HttpStream):
16-
url_base = "http://api.exchangeratesapi.io/"
16+
url_base = "https://api.apilayer.com/exchangerates_data/"
1717

1818
# Set this as a noop.
1919
primary_key = None

docs/connector-development/tutorials/cdk-tutorial-python-http/define-inputs.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ connectionSpecification:
1717
title: Python Http Tutorial Spec
1818
type: object
1919
required:
20-
- access_key
20+
- apikey
2121
- start_date
2222
- base
2323
properties:
24-
access_key:
24+
apikey:
2525
type: string
2626
description: API access key used to retrieve data from the Exchange Rates API.
2727
airbyte_secret: true
@@ -39,9 +39,9 @@ connectionSpecification:
3939
description: "ISO reference currency. See <a href=\"https://www.ecb.europa.eu/stats/policy_and_exchange_rates/euro_reference_exchange_rates/html/index.en.html\">here</a>."
4040
```
4141
42-
In addition to metadata, we define two inputs:
42+
In addition to metadata, we define three inputs:
4343
44-
* `access_key`: The API access key used to authenticate requests to the API
44+
* `apikey`: The API access key used to authenticate requests to the API
4545
* `start_date`: The beginning date to start tracking currency exchange rates from
4646
* `base`: The currency whose rates we're interested in tracking
4747

0 commit comments

Comments
 (0)