-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
create auto_merge package #38019
create auto_merge package #38019
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @alafanechere and the rest of your teammates on |
2f072e3
to
4f8ed1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, reviewing to unblock.
A few nits that I would love you to work through, but I assume there's a stack of PRs where you could clean those up.
@@ -15,3 +15,4 @@ The installation instructions for the `airbyte-ci` CLI tool cal be found here | |||
| [`connectors_qa`](connectors/connectors_qa/) | A tool to verify connectors have sounds assets and metadata. | | |||
| [`metadata_service`](connectors/metadata_service/) | Tools to generate connector metadata and registry. | | |||
| [`pipelines`](connectors/pipelines/) | Airbyte CI pipelines, including formatting, linting, building, testing connectors, etc. Connector acceptance tests live here. | | |||
| [`auto_merge`](connectors/auto_merge/) | A tool to automatically merge connector pull requests. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding it to the index!
Nit: can you also add the live testing tool to this index in here or separate quick PR?
@@ -0,0 +1,33 @@ | |||
# `auto_merge` | |||
|
|||
This python package is made to merge pull requests automatically on the Airbyte Repo. It is used in the [following workflow](TBD). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: don't forget to update this link (I'm assuming it's in the next PR?)
- All the required checks have passed | ||
|
||
|
||
## Install and usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: prettier -w
this.
|
||
[tool.airbyte_ci] | ||
optional_poetry_groups = ["dev"] | ||
poe_tasks = ["type_check", "lint",] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
poe_tasks = ["type_check", "lint",] | |
poe_tasks = ["type_check", "lint",] | |
@@ -0,0 +1,12 @@ | |||
# Copyright (c) 2024 Airbyte, Inc., all rights reserved. | |||
|
|||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from __future__ import annotations |
I think from __future__ import annotations
is not needed in Python 3.10?
AUTO_MERGE_LABEL = "auto-merge" | ||
BASE_BRANCH = "master" | ||
CONNECTOR_PATH_PREFIXES = { | ||
"airbyte-integrations/connectors/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this one has trailing /
, the other two do not. I don't think this is intentional?
|
||
|
||
@contextmanager | ||
def github_client() -> Iterator[Github]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
client.close() | ||
|
||
|
||
def check_if_modifies_connectors_only(modified_files: Iterable[GithubFile]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. Ideally, we'd like to verify that only files in a single connector are modified, but honestly I think this would be too granular to focus on here.
bool: True if the head commit passes all required checks, False otherwise | ||
""" | ||
successful_status_contexts = [commit_status.context for commit_status in head_commit.get_statuses() if commit_status.state == "success"] | ||
successful_check_runs = [check_run.name for check_run in head_commit.get_check_runs() if check_run.conclusion == "success"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why, TIL you can get check status in Python code! This is very neat.
if not has_auto_merge_label: | ||
logging.info(f"PR {pr.number} does not have the {AUTO_MERGE_LABEL} label") | ||
return False | ||
targets_main_branch = pr.base.ref == BASE_BRANCH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you're declaring variables that are only used once on the very next line — let's just use the if statement directly?
I.e. if not pr.base.ref == BASE_BRANCH:
. I don't think you will loose any readability with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @alafanechere! Is the idea that this will run nightly eventually? Should we send the summary to Slack?
import os | ||
|
||
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] | ||
PRODUCTION = os.environ.get("AUTO_MERGE_PRODUCTION", "false").lower() == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required for this PR, but we have various ways that we define truthy in airbyte-ci. At some point we should probably try to make them consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I dont want to get in the way so im approving now.
The only thing I'd strongly encourage is
- Tell users more about the auto merge label, what it is, where it comes from
- Consider containing our "What is a conenctor?" logic to our connector ops library.
This python package is made to merge pull requests automatically on the Airbyte Repo. It is used in the [following workflow](TBD). | ||
|
||
A pull request is currently considered as auto-mergeable if: | ||
- It has the `auto-merge` label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Is this a github label? or a metadata field?
@@ -0,0 +1,33 @@ | |||
# `auto_merge` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 There could be a section on "Purpose" e.g. explain why we have automerge and that the goal is to merge connectors with as little manual airbyte intervention as possible
import os | ||
|
||
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"] | ||
PRODUCTION = os.environ.get("AUTO_MERGE_PRODUCTION", "false").lower() == "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💎
logging.getLogger().setLevel(logging.INFO) | ||
|
||
|
||
@contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💎 Oooh gettting fancy I see
bool: True if all modified files are in CONNECTOR_PATH_PREFIXES, False otherwise | ||
""" | ||
for file in modified_files: | ||
if not any(file.filename.startswith(prefix) for prefix in CONNECTOR_PATH_PREFIXES): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Didn't we have this same logic in the connector ops package? If so should we introduce that as dependency so if we change where connectors are located we dont have to update so many places?
return required_checks.issubset(successful_contexts) | ||
|
||
|
||
def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Ok so here is the most important logic in the system. The business logic.
Its important that its
- Readable
- Easily extendable
- Contained to as few locations (files) as possible
I think youve achieved that so this is suggestion is very optional:
What if we used a validator pattern
def _validate_has_auto_merge_label((head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool, Optional[str]:
has_auto_merge_label = any(label.name == AUTO_MERGE_LABEL for label in pr.labels)
if not has_auto_merge_label:
return False, f"PR {pr.number} does not have the {AUTO_MERGE_LABEL} label"
return True, None
def check_if_pr_is_auto_mergeable(head_commit: GithubCommit, pr: PullRequest, required_checks: set[str]) -> bool:
validators = [
_validate_has_auto_merge_label,
# ...
]
for pr_validator in validators:
is_valid, error = pr_validator(head_commit, pr, required_checks)
if not is_valid
logging.info(error)
return False
(this is an example and could be improved (Named ValidationArgument|Result
type, standard error messaging, etc) but the ideas there.)
Why this way?
- It lets us conditionally add/remove validations for a connector more easily. You can imagine that critical connectors or third party connectors might want to add or skip certain validations
- Validation logic is easier to test
- imo easier to skim and understand what validations run when/under what conditions
Anyway again very optional
logging.info(f"PR {pr.number} does not target {BASE_BRANCH}") | ||
return False | ||
touches_connectors_only = check_if_modifies_connectors_only(pr.get_files()) | ||
if not touches_connectors_only: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 Can we put some newlines space between validations?
logging.info(f"Processing PR {pr.number}") | ||
head_commit = repo.get_commit(sha=pr.head.sha) | ||
if check_if_pr_is_auto_mergeable(head_commit, pr, required_passing_contexts): | ||
if not dry_run: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💎
time.sleep(github_client.rate_limiting_resettime - time.time()) | ||
|
||
|
||
def generate_job_summary_as_markdown(merged_prs: list[PullRequest]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💅 Could move to a helper file
The main thing I'm curious about: if it takes a label anyway that suggests these PRs are auto-created with a nuance or someone clicks the label button. If that's the case, why not use the auto-merge feature of github? |
@bleonard I think the main nuance here is that this package gives us more flexibility as it can override the branch protection rules the normal "auto merge" GH feature follows. |
Yes this will run on a nightly / weekly basis. I think we could send the summary to slack. Might a logic to add to the GHA workflow and not the python package itself. |
4f8ed1b
to
da21c54
Compare
The flexibility makes sense for sure. It can't hurt and the footprint is small. I was expecting, and still assume, this flexibility should be built into the actual tests that get run on the connector. I think this is the case now that long-tail connectors aren't requiringt hat review. Or we could make it so. The we'd know from metadata which connectors gets the full CAT tests (because we have sandboxes) and which ones don't. It's useful for "green" to mean something, basically. So I would assume most of the logic is in the tests that get run not in the merging tool. |
What
Relates to https://github.com/airbytehq/airbyte-internal-issues/issues/7554
Introduce a new internal package to automatically merge pull requests on the airbyte repo
How
Read the README.md please 🙏
User Impact
None, the tool is not used in a GHA workflow yet