-
Notifications
You must be signed in to change notification settings - Fork 4.6k
CI: Add attention messages and commit links to slack lifecycle #32715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! My main concern is related to the internal git
interaction of the metadata service. It makes running "metadata service" on top of a checked-out repo a requirement. I would prefer the client (airbyte-ci
) to be responsible for passing the GitInfo
to the metadata service CLI.
e.g. The git commit hash, the date of the commit, the author of the commit, etc. | ||
|
||
""" | ||
repo = git.Repo(search_parent_directories=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate if the metadata service could avoid direct git
interactions as it will make it required to run on top of a checked out repo. I would prefer to defer the "git log" command execution to the client and make the metadata service only parse a git log output.
This suggestion comes from the kind of difficulties I spotted while trying to make airbyte-ci
target any repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere Given that
- We need more info that just the commit hash
- Passing in all of the git info would be very verbose
- We only want this for airbyte internal auditing and internal notifications.
Could we instead take the approach of try to get the git repo, if it exists attach this info, if not don't attach the info.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a reasonable tradeoff 👍 . Can we log a warning
when no repo was found? (Would it mean that you'd move your try
up by one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely
commit_timestamp = repo.git.log("-1", "--format=%ct", str(original_metadata_file_path)) | ||
commit_author = repo.git.log("-1", "--format=%an", str(original_metadata_file_path)) | ||
commit_author_email = repo.git.log("-1", "--format=%ae", str(original_metadata_file_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid multiple git.log
calls with this slight change:
commit = repo.commit(commit_sha)
return GitInfo.from_commit(commit)
the Commit
object has authored_date
, author
attributes. author
is an Actor object with anme
and email
attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protip! Thank you.
# get the commit hash for the last commit that modified the metadata file | ||
commit_sha = repo.git.log("-1", "--format=%H", str(original_metadata_file_path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For build reproducability and consistency: can we make the client pass the commit id?
The publish job can be triggered for different commit thanks to the airbyte-ci --git-revision
option. It defaults to github.sha
(the sha of the commit which triggered the GHA publish workflow), but we might want to retrigger a publish on a different commit manually, which is not the latest one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we still likely want the last author of a metadata change to be notified right?
Also for republishing a failed publish, typically that is done by manual github action right?
Or am I misunderstanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sorry I was wrong.
|
||
def _safe_load_metadata_file(metadata_file_path: Path) -> dict: | ||
try: | ||
metadata = yaml.safe_load(metadata_file_path.read_text()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would deserialization to a metadata model be more robust in terms of validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason I didnt deserialize to a metadata model here is that we techincally dont want to do validation until the metadata file is considered done. i.e. has had all the transforms applied to it.
This lets us add mandatory generated fields easier in the future.
If thats too future optomized though let me know.
airbyte-ci/connectors/metadata_service/lib/metadata_service/gcs_upload.py
Outdated
Show resolved
Hide resolved
airbyte-ci/connectors/metadata_service/lib/metadata_service/models/generated/AirbyteInternal.py
Outdated
Show resolved
Hide resolved
"""Publish a connector notification log to logger and slack (if enabled).""" | ||
message = PublishConnectorLifecycle.create_log_message(lifecycle_stage, stage_status, message) | ||
message = PublishConnectorLifecycle.create_log_message(lifecycle_stage, stage_status, message, commit_sha, user_identifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the slack bot also invite the committer to the #connector-publish-updates
channel if they are not channel member yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look into that! Though it also may be useful to let users opt out of this feature 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if I'm wrong but I believe an extra change to the MetadataUpload
step in airbyte-ci is required to make this metadata enrichment work on publish: We have to mount the .git
directory to the metadata container here so that the containerized metadata service lib can resolve the current git repo.
Please also:
- Handle
git.exc.InvalidGitRepositoryError
when no repo is available + log a warning and - Use
commit = repo.commit(commit_sha)
return GitInfo.from_commit(commit)
Yup, I think you are correct here, good catch. |
@alafanechere This is back and ready for review! As a reminder this
What this doesn't do is
|
@alafanechere closing this in favor of a stacked version here |
## What <!-- * Describe what the change is solving. Link all GitHub issues related to this change. --> Adds git commit info to the metadata file during upload.  Spun out of #32715 as a stack
## What Updates our slack lifecycle notifications to mention the author of the metadata change on slack  Spun out of #32715 as a stack
## What Adds the generation time and root metadata file info to the registry entry  Spun out of #32715 as a stack closes https://github.com/airbytehq/airbyte-internal-issues/issues/7466
Overview
Result of this weeks hackday