-
Notifications
You must be signed in to change notification settings - Fork 4.5k
🎉 New Source: Google Search Console #5350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,16 @@ | |||
FROM python:3.7-slim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should use alpine, see source-zendesk-support for an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried an image from the zendesk connector, got timezone detection error during testing, rolled back to python:3.7-slim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TZ issue should be fixed, see latest templates
@@ -0,0 +1,5 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should be in a dir called secrets/
otherwise creds will get checked into source control by accident
…-source-google-search-console # Conflicts: # .github/workflows/publish-command.yml # .github/workflows/test-command.yml
@@ -306,6 +306,11 @@ | |||
dockerRepository: airbyte/source-pokeapi | |||
dockerImageTag: 0.1.1 | |||
documentationUrl: https://docs.airbyte.io/integrations/sources/pokeapi | |||
- sourceDefinitionId: eb4c9e00-db83-4d63-a386-39cfa91012a8 | |||
name: Google Search Console (native) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gaart reminder about this comment
|
||
class Sitemaps(GoogleSearchConsole): | ||
""" | ||
API docs: https://developers.google.com/webmaster-tools/search-console-api-original/v3/sitemaps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for including API docs links, makes it much easier to see what's going on!
|
||
def parse_response(self, response: requests.Response, **kwargs) -> Iterable[Mapping]: | ||
records = response.json().get(self.data_field) or [] | ||
for record in sorted(records, key=lambda x: x["keys"][0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just so that I'm on the same page, so the sorting was not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! a few more points here:
- I think state updates are not functioning as intended due to interaction with stream slices
- we should remove the singer connector from the connector index
- need to go through the checklist to create needed docs
- can you add a bootstrap doc like we discussed in retro?
|
||
yield record | ||
|
||
def get_updated_state( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't this is correct? it needs to be scoped by searchType. Currently if you sync only 1 search Type then the connector fails, the state output will just say date: September 1st
which means that we will skip syncing data from previous dates for all other search types. I could be reading this wrong though?
let me know if I can help explain stream slices more clearly, it's definitely not an obvious concept at first
also can you add a unit test for testing state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sherifnada The functionality of state updates has been updated, but now we are faced with the problem of the acceptance test failing due to nesting, we cannot tell the test to get a value from a nested structure with dynamic fields.
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
yield record | ||
|
||
|
||
class SearchAnalyticsAllFields(SearchAnalytics): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should put a warning in the docs that this will take a very long time to sync because it's so granular
/test connector=connectors/source-google-search-console
|
…airbytehq/airbyte into gaart/5235-source-google-search-console
/test connector=connectors/source-google-search-console
|
/test connector=connectors/source-google-search-console |
/test connector=connectors/source-google-search-console
|
airbyte-integrations/builds.md
Outdated
@@ -29,6 +29,7 @@ | |||
| Google Ads | [](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-ads) | | |||
| Google Adwords | [](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-adwords-singer) | | |||
| Google Analytics | [](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-googleanalytics-singer) | | |||
| Google Search Console | [](https://dnsgjos7lj2fu.cloudfront.net/tests/summary/source-google-search-console) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you insert it in alphabetical order?
@@ -0,0 +1,55 @@ | |||
# Google Search Console | |||
|
|||
The API docs: https://developers.google.com/webmaster-tools/search-console-api-original/v3/parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API docs: https://developers.google.com/webmaster-tools/search-console-api-original/v3/parameters. | |
From [the docs](https://support.google.com/webmasters/answer/9128668?hl=en): | |
Google Search Console is a free service offered by Google that helps you monitor, maintain, and troubleshoot your site's presence in Google Search results. | |
Search Console offers tools and reports for the following actions: | |
* Confirm that Google can find and crawl your site. | |
* Fix indexing problems and request re-indexing of new or updated content. | |
* View Google Search traffic data for your site: how often your site appears in Google Search, which search queries show your site, how often searchers click through for those queries, and more. | |
* Receive alerts when Google encounters indexing, spam, or other issues on your site. | |
* Show you which sites link to your website. | |
* Troubleshoot issues for AMP, mobile usability, and other Search features. | |
The API docs: https://developers.google.com/webmaster-tools/search-console-api-original/v3/parameters. |
we have to deal with a large dataset. | ||
|
||
In order to reduce the amount of data, and to retrieve a specific dataset (for example, to get country specific data) | ||
we can use SearchAnalyticsByCountry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use SearchAnalyticsByCountry. | |
we can use SearchAnalyticsByCountry. So each of the SearchAnalytics streams groups data by certain dimensions like date, country, page, etc... |
To chose one we use an authorization field with the `oneOf` parameter in the `spec.json` file. | ||
|
||
|
||
## Analytics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section should be in the source code since if the code changes this can go out of date, so putting it next to the code reduces the chance it goes out of date. . The bootstrap should be more focused on higher level details of how the API works
@@ -82,6 +80,15 @@ If you lose this key pair, you will need to generate a new one! | |||
### Note | |||
You can return to the [API Console/Credentials](https://console.cloud.google.com/apis/credentials) at any time to view the email address, public key fingerprints, and other information, or to generate additional public/private key pairs. For more details about service account credentials in the API Console, see [Service accounts](https://cloud.google.com/iam/docs/understanding-service-accounts) in the API Console help file. | |||
|
|||
### Using the existing User Account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add this section to the external docs as it will be automated via oauth in the UI. This is an excellent README however to place in the credentials/
directory.
|
||
## How to create the client credentials for Google Search Console, to use with Airbyte? | ||
|
||
You can either: | ||
* Use the existing `Service Account` for your Google Project with granted Admin Permissions | ||
* Use the existing `User Account` for your Google Project with granted Admin Permissions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Use the existing `User Account` for your Google Project with granted Admin Permissions | |
* Use your personal Google User Account with oauth. If you choose this option, your account must have permissions to view the Google Search Console project you choose. |
docs/integrations/README.md
Outdated
@@ -42,7 +42,7 @@ Airbyte uses a grading system for connectors to help users understand what to ex | |||
|[Google Analytics](./sources/googleanalytics.md)| Beta | | |||
|[Google Analytics v4](./sources/google-analytics-v4.md)| Beta | | |||
|[Google Directory](./sources/google-directory.md)| Certified | | |||
|[Google Search Console](./sources/google-search-console.md)| Beta | | |||
|[Google Search Console](./sources/google-search-console.md)| Alpha | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should still be beta since we are committed to supporting it
|[Google Search Console](./sources/google-search-console.md)| Alpha | | |
|[Google Search Console](./sources/google-search-console.md)| Beta | |
current_stream_state = {site_url: {search_type: {self.cursor_field: latest_benchmark}}} | ||
|
||
# this line is required to pass the acceptance test successfully | ||
current_stream_state[self.cursor_field] = current_stream_state[site_url][search_type][self.cursor_field] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any way to avoid having this here i.e; adding information only to use it in tests? Is the problem that we want the max over all search types and the yaml format doesn't support that? if so let's add this comment to help the next dev understand
current_stream_state[self.cursor_field] = current_stream_state[site_url][search_type][self.cursor_field] | |
# we need to get the max date over all searchTypes but the current acctest YAML format doesn't | |
# support that | |
current_stream_state[self.cursor_field] = current_stream_state[site_url][search_type][self.cursor_field] |
/test connector=connectors/source-google-search-console
|
/publish connector=connectors/source-google-search-console
|
What
Move Google Search Console from singer to native CDK
How
Rewrite the connector using Python CDK.
GSC has several notable features:
Recommended reading order
x.java
y.python
Pre-merge Checklist
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
docs/SUMMARY.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog exampledocs/integrations/README.md
airbyte-integrations/builds.md
Airbyter
If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.
/test connector=connectors/<name>
command is passing./publish
command described here