Skip to content

🎉 Source Gitlab: Ingest All Accessible Groups #11140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

adamschmidt
Copy link
Contributor

@adamschmidt adamschmidt commented Mar 15, 2022

What

Resolves #11128 - Allow all Gitlab groups accessible to the access token to be ingested

How

Removes the need to specify either group ids and/or project ids in the source configuration. Where neither are specified, use the Gitlab groups API to retrieve the full list of groups that are accessible to the connector and ingest as normal.

Recommended reading order

  1. airbyte-integrations/connectors/source-gitlab/source_gitlab/source.py
  2. airbyte-integrations/connectors/source-gitlab/source_gitlab/util.py

🚨 User Impact 🚨

User will no longer be forced to specify either a set of group ids or project ids, it should no longer fail when the user tries to save with blank in these fields. Existing configurations should not be impacted by this change.

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

Tests

Integration

Simple configuration to keep the test cycle short. Configured in airbyte-integrations/connectors/source-gitlab/integration_tests/configured_catalog.json

{
  "streams": [
    {
      "stream": {
        "name": "groups",
        "json_schema": {},
        "supported_sync_modes": ["full_refresh"],
        "source_defined_primary_key": [["id"]]
      },
      "sync_mode": "full_refresh",
      "destination_sync_mode": "overwrite"
    }
  ]
}

Output from python main.py read --config secrets/config.json --catalog integration_tests/configured_catalog.json:

... snip ...
{"type": "LOG", "log": {"level": "INFO", "message": "Read 157 records from groups stream"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Finished syncing groups"}}
{"type": "LOG", "log": {"level": "INFO", "message": "SourceGitlab runtimes:\nSyncing stream groups 0:03:11.009996"}}
{"type": "LOG", "log": {"level": "INFO", "message": "Finished syncing SourceGitlab"}}

@github-actions github-actions bot added area/connectors Connector related issues area/documentation Improvements or additions to documentation labels Mar 15, 2022
@marcosmarxm marcosmarxm self-assigned this Mar 15, 2022
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some changes

Comment on lines 4 to 21
headers = kwargs["authenticator"].get_auth_header()

ids = []

r = requests.get(f'https://{kwargs["api_url"]}/api/v4/groups?page=1&per_page=50', headers=headers)
results = r.json()
items = map(lambda i: i['full_path'].replace('/', '%2f'), results)
ids.extend(items)

while 'X-Next-Page' in r.headers and r.headers['X-Next-Page'] != '':
next_page = r.headers['X-Next-Page']
per_page = r.headers['X-Per-Page']
r = requests.get(f'https://{kwargs["api_url"]}/api/v4/groups?page={next_page}&per_page={per_page}', headers=headers)
results = r.json()
items = map(lambda i: i['full_path'].replace('/', '%2f'), results)
ids.extend(items)

return ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
headers = kwargs["authenticator"].get_auth_header()
ids = []
r = requests.get(f'https://{kwargs["api_url"]}/api/v4/groups?page=1&per_page=50', headers=headers)
results = r.json()
items = map(lambda i: i['full_path'].replace('/', '%2f'), results)
ids.extend(items)
while 'X-Next-Page' in r.headers and r.headers['X-Next-Page'] != '':
next_page = r.headers['X-Next-Page']
per_page = r.headers['X-Per-Page']
r = requests.get(f'https://{kwargs["api_url"]}/api/v4/groups?page={next_page}&per_page={per_page}', headers=headers)
results = r.json()
items = map(lambda i: i['full_path'].replace('/', '%2f'), results)
ids.extend(items)
return ids
headers = kwargs["authenticator"].get_auth_header()
ids = []
has_next = True
# First request params
per_page = 50
next_page = 1
while has_next:
r = requests.get(f'https://{kwargs["api_url"]}/api/v4/groups?page={next_page}&per_page={per_page}', headers=headers)
next_page = r.headers.get('X-Next-Page')
per_page = r.headers.get('X-Per-Page')
results = r.json()
items = map(lambda i: i['full_path'].replace('/', '%2f'), results)
ids.extend(items)
has_next = 'X-Next-Page' in r.headers and r.headers['X-Next-Page'] != ''
return ids

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored. This was admittedly messy :)

Comment on lines 3 to 4
def get_group_list(**kwargs):
headers = kwargs["authenticator"].get_auth_header()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only used by the Source class, maybe transfer it to inside the class or let inside the source.py file. Don't see a separate file is helping here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to an internal class method on the source

@adamschmidt adamschmidt requested a review from a team as a code owner March 23, 2022 14:02
@CLAassistant
Copy link

CLAassistant commented Mar 23, 2022

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added area/api Related to the api area/frontend area/platform issues related to the platform area/scheduler area/server area/worker Related to worker kubernetes and removed area/worker Related to worker area/frontend area/platform issues related to the platform kubernetes area/server area/api Related to the api labels Mar 23, 2022
Copy link
Member

@marcosmarxm marcosmarxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @adamschmidt I made a few changes but it's good now

@marcosmarxm marcosmarxm temporarily deployed to more-secrets March 23, 2022 14:35 Inactive
@marcosmarxm marcosmarxm temporarily deployed to more-secrets March 23, 2022 14:35 Inactive
@marcosmarxm marcosmarxm merged commit 87566e1 into airbytehq:master Mar 23, 2022
@adamschmidt adamschmidt deleted the adamschmidt/11128-gitlab-soure-groups branch March 23, 2022 22:11
@briveira
Copy link

briveira commented May 4, 2022

Still seeing "non-json response" error messages when adding a local GITLAB source with hundreds of projects in 0.36.7-alpha (after HOURS trying to add the new gitlab source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement: Gitlab Source
5 participants