Skip to content

Commit aecdd96

Browse files
authored
Source wikipedia-pageviews: Migrate to manifest-only (#44460)
1 parent 141daac commit aecdd96

File tree

12 files changed

+199
-1574
lines changed

12 files changed

+199
-1574
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,65 @@
1-
# Wikipedia Pageviews Source
1+
# Wikipedia Pageviews source connector
22

3-
This is the repository for the Wikipedia Pageviews configuration based source connector.
4-
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.com/integrations/sources/wikipedia-pageviews).
3+
This directory contains the manifest-only connector for `source-wikipedia-pageviews`.
4+
This _manifest-only_ connector is not a Python package on its own, as it runs inside of the base `source-declarative-manifest` image.
5+
6+
For information about how to configure and use this connector within Airbyte, see [the connector's full documentation](https://docs.airbyte.com/integrations/sources/wikipedia-pageviews).
57

68
## Local development
79

8-
### Prerequisites
10+
We recommend using the Connector Builder to edit this connector.
11+
Using either Airbyte Cloud or your local Airbyte OSS instance, navigate to the **Builder** tab and select **Import a YAML**.
12+
Then select the connector's `manifest.yaml` file to load the connector into the Builder. You're now ready to make changes to the connector!
913

10-
* Python (`^3.9`)
11-
* Poetry (`^1.7`) - installation instructions [here](https://python-poetry.org/docs/#installation)
14+
If you prefer to develop locally, you can follow the instructions below.
1215

16+
### Building the docker image
1317

18+
You can build any manifest-only connector with `airbyte-ci`:
1419

15-
### Installing the connector
20+
1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
21+
2. Run the following command to build the docker image:
1622

17-
From this connector directory, run:
1823
```bash
19-
poetry install --with dev
24+
airbyte-ci connectors --name=source-wikipedia-pageviews build
2025
```
2126

27+
An image will be available on your host with the tag `airbyte/source-wikipedia-pageviews:dev`.
2228

23-
### Create credentials
29+
### Creating credentials
2430

2531
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.com/integrations/sources/wikipedia-pageviews)
26-
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `spec` inside `source_wikipedia_pageviews/manifest.yaml` file.
32+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `spec` object in the connector's `manifest.yaml` file.
2733
Note that any directory named `secrets` is gitignored across the entire Airbyte repo, so there is no danger of accidentally checking in sensitive information.
28-
See `integration_tests/sample_config.json` for a sample config file.
29-
30-
31-
### Locally running the connector
32-
33-
```
34-
poetry run source-wikipedia-pageviews spec
35-
poetry run source-wikipedia-pageviews check --config secrets/config.json
36-
poetry run source-wikipedia-pageviews discover --config secrets/config.json
37-
poetry run source-wikipedia-pageviews read --config secrets/config.json --catalog integration_tests/configured_catalog.json
38-
```
39-
40-
### Running tests
4134

42-
To run tests locally, from the connector directory run:
43-
44-
```
45-
poetry run pytest tests
46-
```
35+
### Running as a docker container
4736

48-
### Building the docker image
37+
Then run any of the standard source connector commands:
4938

50-
1. Install [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md)
51-
2. Run the following command to build the docker image:
5239
```bash
53-
airbyte-ci connectors --name=source-wikipedia-pageviews build
54-
```
55-
56-
An image will be available on your host with the tag `airbyte/source-wikipedia-pageviews:dev`.
57-
58-
59-
### Running as a docker container
60-
61-
Then run any of the connector commands as follows:
62-
```
6340
docker run --rm airbyte/source-wikipedia-pageviews:dev spec
6441
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-wikipedia-pageviews:dev check --config /secrets/config.json
6542
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-wikipedia-pageviews:dev discover --config /secrets/config.json
6643
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-wikipedia-pageviews:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
6744
```
6845

69-
### Running our CI test suite
46+
### Running the CI test suite
7047

7148
You can run our full test suite locally using [`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md):
72-
```bash
73-
airbyte-ci connectors --name=source-wikipedia-pageviews test
74-
```
75-
76-
### Customizing acceptance Tests
7749

78-
Customize `acceptance-test-config.yml` file to configure acceptance tests. See [Connector Acceptance Tests](https://docs.airbyte.com/connector-development/testing-connectors/connector-acceptance-tests-reference) for more information.
79-
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
80-
81-
### Dependency Management
82-
83-
All of your dependencies should be managed via Poetry.
84-
To add a new dependency, run:
8550
```bash
86-
poetry add <package-name>
51+
airbyte-ci connectors --name=source-wikipedia-pageviews test
8752
```
8853

89-
Please commit the changes to `pyproject.toml` and `poetry.lock` files.
90-
9154
## Publishing a new version of the connector
9255

93-
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
94-
1. Make sure your changes are passing our test suite: `airbyte-ci connectors --name=source-wikipedia-pageviews test`
95-
2. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
56+
If you want to contribute changes to `source-wikipedia-pageviews`, here's how you can do that:
57+
1. Make your changes locally, or load the connector's manifest into Connector Builder and make changes there.
58+
2. Make sure your changes are passing our test suite with `airbyte-ci connectors --name=source-wikipedia-pageviews test`
59+
3. Bump the connector version (please follow [semantic versioning for connectors](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#semantic-versioning-for-connectors)):
9660
- bump the `dockerImageTag` value in in `metadata.yaml`
97-
- bump the `version` value in `pyproject.toml`
98-
3. Make sure the `metadata.yaml` content is up to date.
9961
4. Make sure the connector documentation and its changelog is up to date (`docs/integrations/sources/wikipedia-pageviews.md`).
10062
5. Create a Pull Request: use [our PR naming conventions](https://docs.airbyte.com/contributing-to-airbyte/resources/pull-requests-handbook/#pull-request-title-convention).
10163
6. Pat yourself on the back for being an awesome contributor.
10264
7. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
103-
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.
65+
8. Once your PR is merged, the new version of the connector will be automatically published to Docker Hub and our connector registry.

airbyte-integrations/connectors/source-wikipedia-pageviews/__init__.py

-3
This file was deleted.

airbyte-integrations/connectors/source-wikipedia-pageviews/acceptance-test-config.yml

100755100644
+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
connector_image: airbyte/source-wikipedia-pageviews:dev
44
tests:
55
spec:
6-
- spec_path: "source_wikipedia_pageviews/spec.yaml"
6+
- spec_path: "manifest.yaml"
77
connection:
88
- config_path: "secrets/config.json"
99
status: "succeed"

airbyte-integrations/connectors/source-wikipedia-pageviews/main.py

-8
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
1-
version: 2.4.0
2-
1+
version: 4.3.0
32
type: DeclarativeSource
4-
53
check:
64
type: CheckStream
75
stream_names:
86
- per-article
9-
107
definitions:
118
streams:
129
per-article:
@@ -15,7 +12,8 @@ definitions:
1512
retriever:
1613
type: SimpleRetriever
1714
requester:
18-
$ref: "#/definitions/base_requester"
15+
type: HttpRequester
16+
url_base: https://wikimedia.org/api/rest_v1/metrics/pageviews
1917
path: >-
2018
/per-article/{{config.project}}/{{config.access}}/{{config.agent}}/{{config.article}}/daily/{{stream_slice.start_time}}/{{stream_slice.end_time}}
2119
http_method: GET
@@ -48,16 +46,36 @@ definitions:
4846
schema_loader:
4947
type: InlineSchemaLoader
5048
schema:
51-
$ref: "#/schemas/per-article"
49+
type: object
50+
$schema: http://json-schema.org/schema#
51+
additionalProperties: true
52+
properties:
53+
access:
54+
type: string
55+
agent:
56+
type: string
57+
article:
58+
type: string
59+
granularity:
60+
type: string
61+
project:
62+
type: string
63+
timestamp:
64+
type: string
65+
views:
66+
type: integer
67+
format: int64
5268
top:
5369
type: DeclarativeStream
5470
name: top
5571
retriever:
5672
type: SimpleRetriever
5773
requester:
58-
$ref: "#/definitions/base_requester"
74+
type: HttpRequester
75+
url_base: https://wikimedia.org/api/rest_v1/metrics/pageviews
5976
path: >-
60-
{{ '/top/' ~ config.project ~ '/' ~ config.access ~ '/' ~ config.start[:4] ~ '/' ~ config.start[4:6] ~ '/' ~ config.start[6:8] }}
77+
{{ '/top/' ~ config.project ~ '/' ~ config.access ~ '/' ~ config.start[:4]
78+
~ '/' ~ config.start[4:6] ~ '/' ~ config.start[6:8] }}
6179
http_method: GET
6280
request_headers:
6381
"User-Agent": "AirbyteWikipediaPageviewsConnector/1.0"
@@ -70,15 +88,148 @@ definitions:
7088
schema_loader:
7189
type: InlineSchemaLoader
7290
schema:
73-
$ref: "#/schemas/top"
91+
type: object
92+
$schema: http://json-schema.org/schema#
93+
additionalProperties: true
94+
properties:
95+
access:
96+
type: string
97+
articles:
98+
type: array
99+
items:
100+
type: object
101+
properties:
102+
article:
103+
type: string
104+
rank:
105+
type: integer
106+
format: int32
107+
views:
108+
type: integer
109+
format: int64
110+
day:
111+
type: string
112+
month:
113+
type: string
114+
project:
115+
type: string
116+
year:
117+
type: string
118+
timestamp:
119+
type: string
74120
base_requester:
75121
type: HttpRequester
76122
url_base: https://wikimedia.org/api/rest_v1/metrics/pageviews
77-
78123
streams:
79-
- $ref: "#/definitions/streams/per-article"
80-
- $ref: "#/definitions/streams/top"
81-
124+
- type: DeclarativeStream
125+
name: per-article
126+
retriever:
127+
type: SimpleRetriever
128+
requester:
129+
type: HttpRequester
130+
url_base: https://wikimedia.org/api/rest_v1/metrics/pageviews
131+
path: >-
132+
/per-article/{{config.project}}/{{config.access}}/{{config.agent}}/{{config.article}}/daily/{{stream_slice.start_time}}/{{stream_slice.end_time}}
133+
http_method: GET
134+
request_headers:
135+
"User-Agent": "AirbyteWikipediaPageviewsConnector/1.0"
136+
record_selector:
137+
type: RecordSelector
138+
extractor:
139+
type: DpathExtractor
140+
field_path:
141+
- items
142+
incremental_sync:
143+
type: DatetimeBasedCursor
144+
cursor_field: timestamp
145+
name: per-article
146+
cursor_datetime_formats:
147+
- "%Y%m%d"
148+
- "%Y%m%d%H"
149+
datetime_format: "%Y%m%d"
150+
start_datetime:
151+
type: MinMaxDatetime
152+
datetime: "{{config.start}}"
153+
datetime_format: "%Y%m%d"
154+
end_datetime:
155+
type: MinMaxDatetime
156+
datetime: "{{config.end}}"
157+
datetime_format: "%Y%m%d"
158+
step: P1D
159+
cursor_granularity: P1D
160+
schema_loader:
161+
type: InlineSchemaLoader
162+
schema:
163+
type: object
164+
$schema: http://json-schema.org/schema#
165+
additionalProperties: true
166+
properties:
167+
access:
168+
type: string
169+
agent:
170+
type: string
171+
article:
172+
type: string
173+
granularity:
174+
type: string
175+
project:
176+
type: string
177+
timestamp:
178+
type: string
179+
views:
180+
type: integer
181+
format: int64
182+
- type: DeclarativeStream
183+
name: top
184+
retriever:
185+
type: SimpleRetriever
186+
requester:
187+
type: HttpRequester
188+
url_base: https://wikimedia.org/api/rest_v1/metrics/pageviews
189+
path: >-
190+
{{ '/top/' ~ config.project ~ '/' ~ config.access ~ '/' ~ config.start[:4]
191+
~ '/' ~ config.start[4:6] ~ '/' ~ config.start[6:8] }}
192+
http_method: GET
193+
request_headers:
194+
"User-Agent": "AirbyteWikipediaPageviewsConnector/1.0"
195+
record_selector:
196+
type: RecordSelector
197+
extractor:
198+
type: DpathExtractor
199+
field_path:
200+
- items
201+
schema_loader:
202+
type: InlineSchemaLoader
203+
schema:
204+
type: object
205+
$schema: http://json-schema.org/schema#
206+
additionalProperties: true
207+
properties:
208+
access:
209+
type: string
210+
articles:
211+
type: array
212+
items:
213+
type: object
214+
properties:
215+
article:
216+
type: string
217+
rank:
218+
type: integer
219+
format: int32
220+
views:
221+
type: integer
222+
format: int64
223+
day:
224+
type: string
225+
month:
226+
type: string
227+
project:
228+
type: string
229+
year:
230+
type: string
231+
timestamp:
232+
type: string
82233
spec:
83234
type: Spec
84235
connection_specification:
@@ -142,7 +293,9 @@ spec:
142293
end:
143294
type: string
144295
title: End
145-
description: The date of the last day to include, in YYYYMMDD or YYYYMMDDHH format.
296+
description:
297+
The date of the last day to include, in YYYYMMDD or YYYYMMDDHH
298+
format.
146299
order: 4
147300
project:
148301
type: string
@@ -163,12 +316,10 @@ spec:
163316
format. Also serves as the date to retrieve data for the top articles.
164317
order: 6
165318
additionalProperties: true
166-
167319
metadata:
168320
autoImportSchema:
169321
per-article: false
170322
top: false
171-
172323
schemas:
173324
per-article:
174325
type: object

0 commit comments

Comments
 (0)