Skip to content

Commit 538a420

Browse files
pabloescoderVincent Kocmarcosmarxmoctavia-squidington-iii
authored
🎉 New Source: The Guardian API [low-code CDK] (#18654)
* Add new source: The Guardian API * Add documentation * Fix custom paginator, it now stops without throwing an error * Update the-guardian-api.md with PR number and link * Remove catalog file, add titles to all properties in spec.yaml * Add incremental sync, change parameter names * format * remove order from spec * add guardian to source def * auto-bump connector version Co-authored-by: Vincent Koc <[email protected]> Co-authored-by: marcosmarxm <[email protected]> Co-authored-by: Octavia Squidington III <[email protected]> Co-authored-by: Marcos Marx <[email protected]>
1 parent 1403c1b commit 538a420

28 files changed

+754
-0
lines changed

airbyte-config/init/src/main/resources/seed/source_definitions.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1334,6 +1334,13 @@
13341334
icon: timely.svg
13351335
sourceType: api
13361336
releaseStage: alpha
1337+
- name: The Guardian API
1338+
sourceDefinitionId: d42bd69f-6bf0-4d0b-9209-16231af07a92
1339+
dockerRepository: airbyte/source-the-guardian-api
1340+
dockerImageTag: 0.1.0
1341+
documentationUrl: https://docs.airbyte.com/integrations/sources/the-guardian-api
1342+
sourceType: api
1343+
releaseStage: alpha
13371344
- name: Trello
13381345
sourceDefinitionId: 8da67652-004c-11ec-9a03-0242ac130003
13391346
dockerRepository: airbyte/source-trello

airbyte-config/init/src/main/resources/seed/source_specs.yaml

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12858,6 +12858,78 @@
1285812858
supportsNormalization: false
1285912859
supportsDBT: false
1286012860
supported_destination_sync_modes: []
12861+
- dockerImage: "airbyte/source-the-guardian-api:0.1.0"
12862+
spec:
12863+
documentationUrl: "https://docs.airbyte.com/integrations/sources/the-guardian-api"
12864+
connectionSpecification:
12865+
$schema: "http://json-schema.org/draft-07/schema#"
12866+
title: "The Guardian Api Spec"
12867+
type: "object"
12868+
required:
12869+
- "api_key"
12870+
- "start_date"
12871+
additionalProperties: true
12872+
properties:
12873+
api_key:
12874+
title: "API Key"
12875+
type: "string"
12876+
description: "Your API Key. See <a href=\"https://open-platform.theguardian.com/access/\"\
12877+
>here</a>. The key is case sensitive."
12878+
airbyte_secret: true
12879+
start_date:
12880+
title: "Start Date"
12881+
type: "string"
12882+
description: "Use this to set the minimum date (YYYY-MM-DD) of the results.\
12883+
\ Results older than the start_date will not be shown."
12884+
pattern: "^([1-9][0-9]{3})\\-(0?[1-9]|1[012])\\-(0?[1-9]|[12][0-9]|3[01])$"
12885+
examples:
12886+
- "YYYY-MM-DD"
12887+
query:
12888+
title: "Query"
12889+
type: "string"
12890+
description: "(Optional) The query (q) parameter filters the results to\
12891+
\ only those that include that search term. The q parameter supports AND,\
12892+
\ OR and NOT operators."
12893+
examples:
12894+
- "environment AND NOT water"
12895+
- "environment AND political"
12896+
- "amusement park"
12897+
- "political"
12898+
tag:
12899+
title: "Tag"
12900+
type: "string"
12901+
description: "(Optional) A tag is a piece of data that is used by The Guardian\
12902+
\ to categorise content. Use this parameter to filter results by showing\
12903+
\ only the ones matching the entered tag. See <a href=\"https://content.guardianapis.com/tags?api-key=test\"\
12904+
>here</a> for a list of all tags, and <a href=\"https://open-platform.theguardian.com/documentation/tag\"\
12905+
>here</a> for the tags endpoint documentation."
12906+
examples:
12907+
- "environment/recycling"
12908+
- "environment/plasticbags"
12909+
- "environment/energyefficiency"
12910+
section:
12911+
title: "Section"
12912+
type: "string"
12913+
description: "(Optional) Use this to filter the results by a particular\
12914+
\ section. See <a href=\"https://content.guardianapis.com/sections?api-key=test\"\
12915+
>here</a> for a list of all sections, and <a href=\"https://open-platform.theguardian.com/documentation/section\"\
12916+
>here</a> for the sections endpoint documentation."
12917+
examples:
12918+
- "media"
12919+
- "technology"
12920+
- "housing-network"
12921+
end_date:
12922+
title: "End Date"
12923+
type: "string"
12924+
description: "(Optional) Use this to set the maximum date (YYYY-MM-DD) of\
12925+
\ the results. Results newer than the end_date will not be shown. Default\
12926+
\ is set to the current date (today) for incremental syncs."
12927+
pattern: "^([1-9][0-9]{3})\\-(0?[1-9]|1[012])\\-(0?[1-9]|[12][0-9]|3[01])$"
12928+
examples:
12929+
- "YYYY-MM-DD"
12930+
supportsNormalization: false
12931+
supportsDBT: false
12932+
supported_destination_sync_modes: []
1286112933
- dockerImage: "airbyte/source-trello:0.1.6"
1286212934
spec:
1286312935
documentationUrl: "https://docs.airbyte.com/integrations/sources/trello"

airbyte-integrations/builds.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,7 @@
121121
| Strava | [![source-stava](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-strava%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-strava) |
122122
| Stripe | [![source-stripe](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-stripe%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-stripe) |
123123
| Tempo | [![source-tempo](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-tempo%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-tempo) |
124+
| The Guardian API | [![source-the-guardian-api](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-the-guardian-api%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-the-guardian-api) |
124125
| TikTok Marketing | [![source-tiktok-marketing](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-tiktok-marketing%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-tiktok-marketing) |
125126
| Trello | [![source-trello](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-trello%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-trello) |
126127
| Twilio | [![source-twilio](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-twilio%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-twilio) |
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*
2+
!Dockerfile
3+
!main.py
4+
!source_the_guardian_api
5+
!setup.py
6+
!secrets
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
FROM python:3.9.11-alpine3.15 as base
2+
3+
# build and load all requirements
4+
FROM base as builder
5+
WORKDIR /airbyte/integration_code
6+
7+
# upgrade pip to the latest version
8+
RUN apk --no-cache upgrade \
9+
&& pip install --upgrade pip \
10+
&& apk --no-cache add tzdata build-base
11+
12+
13+
COPY setup.py ./
14+
# install necessary packages to a temporary folder
15+
RUN pip install --prefix=/install .
16+
17+
# build a clean environment
18+
FROM base
19+
WORKDIR /airbyte/integration_code
20+
21+
# copy all loaded and built libraries to a pure basic image
22+
COPY --from=builder /install /usr/local
23+
# add default timezone settings
24+
COPY --from=builder /usr/share/zoneinfo/Etc/UTC /etc/localtime
25+
RUN echo "Etc/UTC" > /etc/timezone
26+
27+
# bash is installed for more convenient debugging.
28+
RUN apk --no-cache add bash
29+
30+
# copy payload code only
31+
COPY main.py ./
32+
COPY source_the_guardian_api ./source_the_guardian_api
33+
34+
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
35+
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
36+
37+
LABEL io.airbyte.version=0.1.0
38+
LABEL io.airbyte.name=airbyte/source-the-guardian-api
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# The Guardian Api Source
2+
3+
This is the repository for the The Guardian Api configuration based source connector.
4+
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/the-guardian-api).
5+
6+
## Local development
7+
8+
#### Building via Gradle
9+
You can also build the connector in Gradle. This is typically used in CI and not needed for your development workflow.
10+
11+
To build using Gradle, from the Airbyte repository root, run:
12+
```
13+
./gradlew :airbyte-integrations:connectors:source-the-guardian-api:build
14+
```
15+
16+
#### Create credentials
17+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/the-guardian-api)
18+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_the_guardian_api/spec.yaml` file.
19+
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
20+
See `integration_tests/sample_config.json` for a sample config file.
21+
22+
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source the-guardian-api test creds`
23+
and place them into `secrets/config.json`.
24+
25+
### Locally running the connector docker image
26+
27+
#### Build
28+
First, make sure you build the latest Docker image:
29+
```
30+
docker build . -t airbyte/source-the-guardian-api:dev
31+
```
32+
33+
You can also build the connector image via Gradle:
34+
```
35+
./gradlew :airbyte-integrations:connectors:source-the-guardian-api:airbyteDocker
36+
```
37+
When building via Gradle, the docker image name and tag, respectively, are the values of the `io.airbyte.name` and `io.airbyte.version` `LABEL`s in
38+
the Dockerfile.
39+
40+
#### Run
41+
Then run any of the connector commands as follows:
42+
```
43+
docker run --rm airbyte/source-the-guardian-api:dev spec
44+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-the-guardian-api:dev check --config /secrets/config.json
45+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-the-guardian-api:dev discover --config /secrets/config.json
46+
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-the-guardian-api:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
47+
```
48+
## Testing
49+
50+
#### Acceptance Tests
51+
Customize `acceptance-test-config.yml` file to configure tests. See [Source Acceptance Tests](https://docs.airbyte.io/connector-development/testing-connectors/source-acceptance-tests-reference) for more information.
52+
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
53+
54+
To run your integration tests with docker
55+
56+
### Using gradle to run tests
57+
All commands should be run from airbyte project root.
58+
To run unit tests:
59+
```
60+
./gradlew :airbyte-integrations:connectors:source-the-guardian-api:unitTest
61+
```
62+
To run acceptance and custom integration tests:
63+
```
64+
./gradlew :airbyte-integrations:connectors:source-the-guardian-api:integrationTest
65+
```
66+
67+
## Dependency Management
68+
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
69+
We split dependencies between two groups, dependencies that are:
70+
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
71+
* required for the testing need to go to `TEST_REQUIREMENTS` list
72+
73+
### Publishing a new version of the connector
74+
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
75+
1. Make sure your changes are passing unit and integration tests.
76+
1. Bump the connector version in `Dockerfile` -- just increment the value of the `LABEL io.airbyte.version` appropriately (we use [SemVer](https://semver.org/)).
77+
1. Create a Pull Request.
78+
1. Pat yourself on the back for being an awesome contributor.
79+
1. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#
2+
# Copyright (c) 2022 Airbyte, Inc., all rights reserved.
3+
#
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# See [Source Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/source-acceptance-tests-reference)
2+
# for more information about how to configure these tests
3+
connector_image: airbyte/source-the-guardian-api:dev
4+
acceptance_tests:
5+
spec:
6+
tests:
7+
- spec_path: "source_the_guardian_api/spec.yaml"
8+
connection:
9+
tests:
10+
- config_path: "secrets/config.json"
11+
status: "succeed"
12+
- config_path: "integration_tests/invalid_config.json"
13+
status: "failed"
14+
discovery:
15+
tests:
16+
- config_path: "secrets/config.json"
17+
basic_read:
18+
tests:
19+
- config_path: "secrets/config.json"
20+
configured_catalog_path: "integration_tests/configured_catalog.json"
21+
empty_streams: []
22+
incremental:
23+
tests:
24+
- config_path: "secrets/config.json"
25+
configured_catalog_path: "integration_tests/configured_catalog.json"
26+
future_state_path: "integration_tests/abnormal_state.json"
27+
full_refresh:
28+
tests:
29+
- config_path: "secrets/config.json"
30+
configured_catalog_path: "integration_tests/configured_catalog.json"
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/usr/bin/env sh
2+
3+
# Build latest connector image
4+
docker build . -t $(cat acceptance-test-config.yml | grep "connector_image" | head -n 1 | cut -d: -f2-)
5+
6+
# Pull latest acctest image
7+
docker pull airbyte/source-acceptance-test:latest
8+
9+
# Run
10+
docker run --rm -it \
11+
-v /var/run/docker.sock:/var/run/docker.sock \
12+
-v /tmp:/tmp \
13+
-v $(pwd):/test_input \
14+
airbyte/source-acceptance-test \
15+
--acceptance-test-config /test_input
16+
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# The Guardian API
2+
3+
## Overview
4+
5+
[The Guardian Open Platform](https://open-platform.theguardian.com/) is a public web service for accessing all the content the Guardian creates, categorised by tags and section. To get started, You need a key to successfully authenticate against the API. The Guardian API Connector is implemented with the [Airbyte Low-Code CDK](https://docs.airbyte.com/connector-development/config-based/low-code-cdk-overview).
6+
7+
## Output Format
8+
9+
#### Each content item has the following structure:-
10+
11+
```yaml
12+
{
13+
"id": "string",
14+
"type": "string"
15+
"sectionId": "string"
16+
"sectionName": "string"
17+
"webPublicationDate": "string"
18+
"webTitle": "string"
19+
"webUrl": "string"
20+
"apiUrl": "string"
21+
"isHosted": "boolean"
22+
"pillarId": "string"
23+
"pillarName": "string"
24+
}
25+
```
26+
27+
**Description:-**
28+
29+
**webPublicationDate**: The combined date and time of publication
30+
**webUrl**: The URL of the html content
31+
**apiUrl**: The URL of the raw content
32+
33+
## Core Streams
34+
35+
Connector supports the `content` stream that returns all pieces of content in the API.
36+
37+
## Rate Limiting
38+
39+
The key that you are assigned is rate-limited and as such any applications that depend on making large numbers of requests on a polling basis are likely to exceed their daily quota and thus be prevented from making further requests until the next period begins.
40+
41+
## Authentication and Permissions
42+
43+
To access the API, you will need to sign up for an API key, which should be sent with every request. Visit [this](https://open-platform.theguardian.com/access) link to get an API key.
44+
The easiest way to see what data is included is to explore the data. You can build complex queries quickly and browse the results. Visit [this](https://open-platform.theguardian.com/explore) link to explore the data.
45+
46+
See [this](https://docs.airbyte.io/integrations/sources/the-guardian-api) link for the connector docs.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
plugins {
2+
id 'airbyte-python'
3+
id 'airbyte-docker'
4+
id 'airbyte-source-acceptance-test'
5+
}
6+
7+
airbytePython {
8+
moduleDirectory 'source_the_guardian_api'
9+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#
2+
# Copyright (c) 2022 Airbyte, Inc., all rights reserved.
3+
#
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"content": {
3+
"webPublicationDate": "2123-10-31T10:10:10Z"
4+
}
5+
}
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#
2+
# Copyright (c) 2022 Airbyte, Inc., all rights reserved.
3+
#
4+
5+
6+
import pytest
7+
8+
pytest_plugins = ("source_acceptance_test.plugin",)
9+
10+
11+
@pytest.fixture(scope="session", autouse=True)
12+
def connector_setup():
13+
"""This fixture is a placeholder for external resources that acceptance test might require."""
14+
# TODO: setup test dependencies if needed. otherwise remove the TODO comments
15+
yield
16+
# TODO: clean up test dependencies
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"streams": [
3+
{
4+
"stream": {
5+
"name": "content",
6+
"json_schema": {},
7+
"supported_sync_modes": ["full_refresh", "incremental"]
8+
},
9+
"sync_mode": "incremental",
10+
"destination_sync_mode": "overwrite"
11+
}
12+
]
13+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"api_key": "<invalid api_key>",
3+
"query": "water OR rain",
4+
"start_date": "2022-10-25"
5+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"api_key": "<valid api_key>",
3+
"query": "water OR rain OR thunder",
4+
"start_date": "2022-10-25"
5+
}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"content": {
3+
"webPublicationDate": "2022-10-25T10:10:10Z"
4+
}
5+
}

0 commit comments

Comments
 (0)