Skip to content

Commit c0d4652

Browse files
authored
🎉 New Source: Google Search Console (#5350)
* Generate Google Search Console connector * Add schema * Upd schema * Upd authenticator * Add creds retrieving script * Remove legacy dep * Upd dockerfile base image * Add sample config * Upd source definitions, add ci_credentials injection * Upd schema * Upd GSC creds injection * Cleanup * Add tzdata * Upd tzdata installing * Change base docker image * Upd streams * Fix typo * Upd supported_sync_modes list * Add multiple site, service account, search type support * Fix typo * Upd streams, pagination, multi site support * Add service account secrets * Remove source-google-search-console-singer from source definitions * Upd creds retrieving base image * Upd schema * Upd docs * Add badge * Upd schema * Upd docs * Move the cursor field to the top * Upd docs
1 parent aa9786d commit c0d4652

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2050
-14
lines changed

.github/workflows/publish-command.yml

+2
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,8 @@ jobs:
107107
GOOGLE_CLOUD_STORAGE_TEST_CREDS: ${{ secrets.GOOGLE_CLOUD_STORAGE_TEST_CREDS }}
108108
GOOGLE_DIRECTORY_TEST_CREDS: ${{ secrets.GOOGLE_DIRECTORY_TEST_CREDS }}
109109
GOOGLE_SEARCH_CONSOLE_TEST_CREDS: ${{ secrets.GOOGLE_SEARCH_CONSOLE_TEST_CREDS }}
110+
GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS: ${{ secrets.GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS }}
111+
GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS_SRV_ACC: ${{ secrets.GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS_SRV_ACC }}
110112
GOOGLE_SHEETS_TESTS_CREDS: ${{ secrets.GOOGLE_SHEETS_TESTS_CREDS }}
111113
GOOGLE_WORKSPACE_ADMIN_REPORTS_TEST_CREDS: ${{ secrets.GOOGLE_WORKSPACE_ADMIN_REPORTS_TEST_CREDS }}
112114
GREENHOUSE_TEST_CREDS: ${{ secrets.GREENHOUSE_TEST_CREDS }}

.github/workflows/test-command.yml

+2
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,8 @@ jobs:
107107
GOOGLE_CLOUD_STORAGE_TEST_CREDS: ${{ secrets.GOOGLE_CLOUD_STORAGE_TEST_CREDS }}
108108
GOOGLE_DIRECTORY_TEST_CREDS: ${{ secrets.GOOGLE_DIRECTORY_TEST_CREDS }}
109109
GOOGLE_SEARCH_CONSOLE_TEST_CREDS: ${{ secrets.GOOGLE_SEARCH_CONSOLE_TEST_CREDS }}
110+
GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS: ${{ secrets.GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS }}
111+
GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS_SRV_ACC: ${{ secrets.GOOGLE_SEARCH_CONSOLE_CDK_TEST_CREDS_SRV_ACC }}
110112
GOOGLE_SHEETS_TESTS_CREDS: ${{ secrets.GOOGLE_SHEETS_TESTS_CREDS }}
111113
GOOGLE_WORKSPACE_ADMIN_REPORTS_TEST_CREDS: ${{ secrets.GOOGLE_WORKSPACE_ADMIN_REPORTS_TEST_CREDS }}
112114
GREENHOUSE_TEST_CREDS: ${{ secrets.GREENHOUSE_TEST_CREDS }}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"sourceDefinitionId": "eb4c9e00-db83-4d63-a386-39cfa91012a8",
3+
"name": "Google Search Console (native)",
4+
"dockerRepository": "airbyte/source-google-search-console",
5+
"dockerImageTag": "0.1.0",
6+
"documentationUrl": "https://docs.airbyte.io/integrations/sources/google-search-console"
7+
}

airbyte-config/init/src/main/resources/seed/source_definitions.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -311,10 +311,10 @@
311311
dockerRepository: airbyte/source-pokeapi
312312
dockerImageTag: 0.1.1
313313
documentationUrl: https://docs.airbyte.io/integrations/sources/pokeapi
314-
- sourceDefinitionId: 5a1d14c2-d829-49cd-8437-1e87dc9f5368
314+
- sourceDefinitionId: eb4c9e00-db83-4d63-a386-39cfa91012a8
315315
name: Google Search Console
316-
dockerRepository: airbyte/source-google-search-console-singer
317-
dockerImageTag: 0.1.3
316+
dockerRepository: airbyte/source-google-search-console
317+
dockerImageTag: 0.1.0
318318
documentationUrl: https://docs.airbyte.io/integrations/sources/google-search-console
319319
- sourceDefinitionId: bad83517-5e54-4a3d-9b53-63e85fbd4d7c
320320
name: ClickHouse

airbyte-integrations/builds.md

+1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
| Google Adwords | [![source-google-adwords-singer](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-google-adwords-singer%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-adwords-singer) |
3131
| Google Analytics | [![source-googleanalytics-singer](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-googleanalytics-singer%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-googleanalytics-singer) |
3232
| Google Analytics v4 | [![source-google-analytics-v4](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-google-analytics-v4%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-analytics-v4) |
33+
| Google Search Console | [![source-google-search-console](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-google-search-console%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-search-console) |
3334
| Google Sheets | [![source-google-sheets](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-google-sheets%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-sheets) |
3435
| Google Directory API | [![source-google-directory](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-google-directory%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-directory) |
3536
| Google Workspace Admin | [![source-google-workspace-admin-reports](https://img.shields.io/endpoint?url=https%3A%2F%2Fdnsgjos7lj2fu.cloudfront.net%2Ftests%2Fsummary%2Fsource-google-workspace-admin-reports%2Fbadge.json)](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-workspace-admin-reports) |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
*
2+
!Dockerfile
3+
!Dockerfile.test
4+
!main.py
5+
!source_google_search_console
6+
!setup.py
7+
!secrets
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Google Search Console
2+
3+
From [the docs](https://support.google.com/webmasters/answer/9128668?hl=en):
4+
5+
Google Search Console is a free service offered by Google that helps you monitor, maintain, and troubleshoot your site's presence in Google Search results.
6+
7+
Search Console offers tools and reports for the following actions:
8+
9+
* Confirm that Google can find and crawl your site.
10+
* Fix indexing problems and request re-indexing of new or updated content.
11+
* View Google Search traffic data for your site: how often your site appears in Google Search, which search queries show your site, how often searchers click through for those queries, and more.
12+
* Receive alerts when Google encounters indexing, spam, or other issues on your site.
13+
* Show you which sites link to your website.
14+
* Troubleshoot issues for AMP, mobile usability, and other Search features.
15+
16+
The API docs: https://developers.google.com/webmaster-tools/search-console-api-original/v3/parameters.
17+
18+
## Endpoints and Streams:
19+
20+
1. [Site](https://developers.google.com/webmaster-tools/search-console-api-original/v3/sites) – Full refresh
21+
2. [Sitemaps](https://developers.google.com/webmaster-tools/search-console-api-original/v3/sitemaps) – Full refresh
22+
3. [Analytics](https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics) – Full refresh, Incremental
23+
24+
There are multiple streams in the `Analytics` endpoint.
25+
We have them because if we want to get all the data from the GSC (using the SearchAnalyticsAllFields stream),
26+
we have to deal with a large dataset.
27+
28+
In order to reduce the amount of data, and to retrieve a specific dataset (for example, to get country specific data)
29+
we can use SearchAnalyticsByCountry.
30+
So each of the SearchAnalytics streams groups data by certain dimensions like date, country, page, etc.
31+
32+
There are:
33+
1. SearchAnalyticsByDate
34+
2. SearchAnalyticsByCountry
35+
3. SearchAnalyticsByPage
36+
4. SearchAnalyticsByQuery
37+
5. SearchAnalyticsAllFields
38+
39+
## Authorization
40+
41+
There are 2 types of authorization `User Account` and `Service Account`.
42+
To chose one we use an authorization field with the `oneOf` parameter in the `spec.json` file.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
FROM python:3.7-slim
2+
3+
# Bash is installed for more convenient debugging.
4+
RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
5+
6+
WORKDIR /airbyte/integration_code
7+
COPY source_google_search_console ./source_google_search_console
8+
COPY main.py ./
9+
COPY setup.py ./
10+
RUN pip install .
11+
12+
ENV AIRBYTE_ENTRYPOINT "python /airbyte/integration_code/main.py"
13+
ENTRYPOINT ["python", "/airbyte/integration_code/main.py"]
14+
15+
LABEL io.airbyte.version=0.1.0
16+
LABEL io.airbyte.name=airbyte/source-google-search-console
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Google Search Console Source
2+
3+
This is the repository for the Google Search Console source connector, written in Python.
4+
For information about how to use this connector within Airbyte, see [the documentation](https://docs.airbyte.io/integrations/sources/google-search-console).
5+
6+
## Local development
7+
8+
### Prerequisites
9+
**To iterate on this connector, make sure to complete this prerequisites section.**
10+
11+
#### Minimum Python version required `= 3.7.0`
12+
13+
#### Build & Activate Virtual Environment and install dependencies
14+
From this connector directory, create a virtual environment:
15+
```
16+
python -m venv .venv
17+
```
18+
19+
This will generate a virtualenv for this module in `.venv/`. Make sure this venv is active in your
20+
development environment of choice. To activate it from the terminal, run:
21+
```
22+
source .venv/bin/activate
23+
pip install -r requirements.txt
24+
```
25+
If you are in an IDE, follow your IDE's instructions to activate the virtualenv.
26+
27+
Note that while we are installing dependencies from `requirements.txt`, you should only edit `setup.py` for your dependencies. `requirements.txt` is
28+
used for editable installs (`pip install -e`) to pull in Python dependencies from the monorepo and will call `setup.py`.
29+
If this is mumbo jumbo to you, don't worry about it, just put your deps in `setup.py` but install using `pip install -r requirements.txt` and everything
30+
should work as you expect.
31+
32+
#### Building via Gradle
33+
From the Airbyte repository root, run:
34+
```
35+
./gradlew :airbyte-integrations:connectors:source-google-search-console:build
36+
```
37+
38+
#### Create credentials
39+
**If you are a community contributor**, follow the instructions in the [documentation](https://docs.airbyte.io/integrations/sources/google-search-console)
40+
to generate the necessary credentials. Then create a file `secrets/config.json` conforming to the `source_google_search_console/spec.json` file.
41+
Note that the `secrets` directory is gitignored by default, so there is no danger of accidentally checking in sensitive information.
42+
See `integration_tests/sample_config.json` for a sample config file.
43+
44+
**If you are an Airbyte core member**, copy the credentials in Lastpass under the secret name `source google-search-console test creds`
45+
and place them into `secrets/config.json`.
46+
47+
### Locally running the connector
48+
```
49+
python main.py spec
50+
python main.py check --config secrets/config.json
51+
python main.py discover --config secrets/config.json
52+
python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json
53+
```
54+
55+
### Locally running the connector docker image
56+
57+
#### Build
58+
First, make sure you build the latest Docker image:
59+
```
60+
docker build . -t airbyte/source-google-search-console:dev
61+
```
62+
63+
You can also build the connector image via Gradle:
64+
```
65+
./gradlew :airbyte-integrations:connectors:source-google-search-console:airbyteDocker
66+
```
67+
When building via Gradle, the docker image name and tag, respectively, are the values of the `io.airbyte.name` and `io.airbyte.version` `LABEL`s in
68+
the Dockerfile.
69+
70+
#### Run
71+
Then run any of the connector commands as follows:
72+
```
73+
docker run --rm airbyte/source-google-search-console:dev spec
74+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-google-search-console:dev check --config /secrets/config.json
75+
docker run --rm -v $(pwd)/secrets:/secrets airbyte/source-google-search-console:dev discover --config /secrets/config.json
76+
docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-google-search-console:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json
77+
```
78+
## Testing
79+
Make sure to familiarize yourself with [pytest test discovery](https://docs.pytest.org/en/latest/goodpractices.html#test-discovery) to know how your test files and methods should be named.
80+
First install test dependencies into your virtual environment:
81+
```
82+
pip install .[tests]
83+
```
84+
### Unit Tests
85+
To run unit tests locally, from the connector directory run:
86+
```
87+
python -m pytest unit_tests
88+
```
89+
90+
### Integration Tests
91+
There are two types of integration tests: Acceptance Tests (Airbyte's test suite for all source connectors) and custom integration tests (which are specific to this connector).
92+
#### Custom Integration tests
93+
Place custom tests inside `integration_tests/` folder, then, from the connector root, run
94+
```
95+
python -m pytest integration_tests
96+
```
97+
#### Acceptance Tests
98+
Customize `acceptance-test-config.yml` file to configure tests. See [Source Acceptance Tests](source-acceptance-tests.md) for more information.
99+
If your connector requires to create or destroy resources for use during acceptance tests create fixtures for it and place them inside integration_tests/acceptance.py.
100+
To run your integration tests with acceptance tests, from the connector root, run
101+
```
102+
python -m pytest integration_tests -p integration_tests.acceptance
103+
```
104+
To run your integration tests with docker
105+
106+
### Using gradle to run tests
107+
All commands should be run from airbyte project root.
108+
To run unit tests:
109+
```
110+
./gradlew :airbyte-integrations:connectors:source-google-search-console:unitTest
111+
```
112+
To run acceptance and custom integration tests:
113+
```
114+
./gradlew :airbyte-integrations:connectors:source-google-search-console:integrationTest
115+
```
116+
117+
## Dependency Management
118+
All of your dependencies should go in `setup.py`, NOT `requirements.txt`. The requirements file is only used to connect internal Airbyte dependencies in the monorepo for local development.
119+
We split dependencies between two groups, dependencies that are:
120+
* required for your connector to work need to go to `MAIN_REQUIREMENTS` list.
121+
* required for the testing need to go to `TEST_REQUIREMENTS` list
122+
123+
### Publishing a new version of the connector
124+
You've checked out the repo, implemented a million dollar feature, and you're ready to share your changes with the world. Now what?
125+
1. Make sure your changes are passing unit and integration tests.
126+
1. Bump the connector version in `Dockerfile` -- just increment the value of the `LABEL io.airbyte.version` appropriately (we use [SemVer](https://semver.org/)).
127+
1. Create a Pull Request.
128+
1. Pat yourself on the back for being an awesome contributor.
129+
1. Someone from Airbyte will take a look at your PR and iterate with you to merge it into master.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# See [Source Acceptance Tests](https://docs.airbyte.io/contributing-to-airbyte/building-new-connector/source-acceptance-tests.md)
2+
# for more information about how to configure these tests
3+
connector_image: airbyte/source-google-search-console:dev
4+
tests:
5+
spec:
6+
- spec_path: "source_google_search_console/spec.json"
7+
connection:
8+
- config_path: "secrets/config.json"
9+
status: "succeed"
10+
- config_path: "secrets/service_account_config.json"
11+
status: "failed"
12+
- config_path: "integration_tests/invalid_config.json"
13+
status: "failed"
14+
discovery:
15+
- config_path: "secrets/config.json"
16+
basic_read:
17+
- config_path: "secrets/config.json"
18+
configured_catalog_path: "integration_tests/configured_catalog.json"
19+
empty_streams: []
20+
full_refresh:
21+
- config_path: "secrets/config.json"
22+
configured_catalog_path: "integration_tests/catalog.json"
23+
incremental:
24+
- config_path: "secrets/config.json"
25+
configured_catalog_path: "integration_tests/configured_catalog.json"
26+
future_state_path: "integration_tests/abnormal_state.json"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/usr/bin/env sh
2+
3+
# Build latest connector image
4+
docker build . -t $(cat acceptance-test-config.yml | grep "connector_image" | head -n 1 | cut -d: -f2)
5+
6+
# Pull latest acctest image
7+
docker pull airbyte/source-acceptance-test:latest
8+
9+
# Run
10+
docker run --rm -it \
11+
-v /var/run/docker.sock:/var/run/docker.sock \
12+
-v /tmp:/tmp \
13+
-v $(pwd):/test_input \
14+
airbyte/source-acceptance-test \
15+
--acceptance-test-config /test_input
16+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
plugins {
2+
id 'airbyte-python'
3+
id 'airbyte-docker'
4+
id 'airbyte-source-acceptance-test'
5+
}
6+
7+
airbytePython {
8+
moduleDirectory 'source_google_search_console'
9+
}
10+
11+
dependencies {
12+
implementation files(project(':airbyte-integrations:bases:source-acceptance-test').airbyteDocker.outputs)
13+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
FROM python:3.7-slim
2+
3+
# Bash is installed for more convenient debugging.
4+
RUN apt-get update && apt-get install -y bash && rm -rf /var/lib/apt/lists/*
5+
COPY . ./
6+
RUN pip install . --use-feature=in-tree-build
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
### Using the existing User Account
2+
3+
1. Follow instructions [here](https://www.balbooa.com/gridbox-documentation/how-to-get-google-client-id-and-client-secret), to get `CLIENT_ID, CLIENT_SECRET and REDIRECTED_URI`
4+
2. Source `Google Search Console` provides scripts to easy get User Account credentials:
5+
1. Go to the `connectors/google-search-console/credentials` directory.
6+
2. Fill the file `credentials.json` with your personal credentials from step 1.
7+
3. Run the `./get_credentials.sh` script and follow the instructions.
8+
4. Copy the `refresh_token` from the console.
9+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"client_id": "YOUR_CLIENT_ID",
3+
"client_secret": "YOUR_CLIENT_SECRET",
4+
"redirect_uri": "YOUR_REDIRECTED_URI"
5+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
#
2+
# MIT License
3+
#
4+
# Copyright (c) 2020 Airbyte
5+
#
6+
# Permission is hereby granted, free of charge, to any person obtaining a copy
7+
# of this software and associated documentation files (the "Software"), to deal
8+
# in the Software without restriction, including without limitation the rights
9+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10+
# copies of the Software, and to permit persons to whom the Software is
11+
# furnished to do so, subject to the following conditions:
12+
#
13+
# The above copyright notice and this permission notice shall be included in all
14+
# copies or substantial portions of the Software.
15+
#
16+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22+
# SOFTWARE.
23+
#
24+
25+
import json
26+
27+
# Check https://developers.google.com/webmaster-tools/search-console-api-original/v3/ for all available scopes
28+
OAUTH_SCOPE = "https://www.googleapis.com/auth/webmasters.readonly"
29+
30+
with open("credentials.json", "r") as f:
31+
credentials = json.load(f)
32+
33+
CLIENT_ID = credentials.get("client_id")
34+
CLIENT_SECRET = credentials.get("client_secret")
35+
REDIRECT_URI = credentials.get("redirect_uri")
36+
37+
authorize_url = (
38+
f"https://accounts.google.com/o/oauth2/v2/auth"
39+
f"?response_type=code"
40+
f"&access_type=offline"
41+
f"&prompt=consent&client_id={CLIENT_ID}"
42+
f"&redirect_uri={REDIRECT_URI}"
43+
f"&scope={OAUTH_SCOPE}"
44+
)
45+
print(f"Go to the following link in your browser: {authorize_url} and copy code from URL")
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
#!/bin/bash
2+
3+
docker build . -t airbyte-get-gsc-credentials
4+
docker run --name airbyte-get-gsc-credentials -t -d airbyte-get-gsc-credentials
5+
docker exec -it airbyte-get-gsc-credentials python get_authentication_url.py
6+
echo "Input your code:"
7+
read code
8+
docker exec -it airbyte-get-gsc-credentials python get_refresh_token.py $code
9+
docker rm airbyte-get-gsc-credentials --force

0 commit comments

Comments
 (0)