Skip to content

CI: Add attention messages and commit links to slack lifecycle #32715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Nov 21, 2023

Overview

Result of this weeks hackday

  1. Adds git info to all uploaded metadata files
  2. Adds git info, metadata etag/bucket/file path to the registry
  3. Updates our metadata slack logging to @ the author in slack
image

Copy link

vercel bot commented Nov 21, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 3, 2024 3:37pm

@bnchrch bnchrch marked this pull request as ready for review November 24, 2023 01:30
@bnchrch bnchrch requested a review from a team November 24, 2023 01:30
Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! My main concern is related to the internal git interaction of the metadata service. It makes running "metadata service" on top of a checked-out repo a requirement. I would prefer the client (airbyte-ci) to be responsible for passing the GitInfo to the metadata service CLI.

e.g. The git commit hash, the date of the commit, the author of the commit, etc.

"""
repo = git.Repo(search_parent_directories=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate if the metadata service could avoid direct git interactions as it will make it required to run on top of a checked out repo. I would prefer to defer the "git log" command execution to the client and make the metadata service only parse a git log output.

This suggestion comes from the kind of difficulties I spotted while trying to make airbyte-ci target any repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alafanechere Given that

  1. We need more info that just the commit hash
  2. Passing in all of the git info would be very verbose
  3. We only want this for airbyte internal auditing and internal notifications.

Could we instead take the approach of try to get the git repo, if it exists attach this info, if not don't attach the info.

WDYT?

Copy link
Contributor

@alafanechere alafanechere Dec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a reasonable tradeoff 👍 . Can we log a warning when no repo was found? (Would it mean that you'd move your try up by one line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely

Comment on lines 207 to 209
commit_timestamp = repo.git.log("-1", "--format=%ct", str(original_metadata_file_path))
commit_author = repo.git.log("-1", "--format=%an", str(original_metadata_file_path))
commit_author_email = repo.git.log("-1", "--format=%ae", str(original_metadata_file_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can avoid multiple git.log calls with this slight change:

commit = repo.commit(commit_sha)
return GitInfo.from_commit(commit)

the Commit object has authored_date, author attributes. author is an Actor object with anme and email attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Protip! Thank you.

Comment on lines +205 to +206
# get the commit hash for the last commit that modified the metadata file
commit_sha = repo.git.log("-1", "--format=%H", str(original_metadata_file_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For build reproducability and consistency: can we make the client pass the commit id?
The publish job can be triggered for different commit thanks to the airbyte-ci --git-revision option. It defaults to github.sha (the sha of the commit which triggered the GHA publish workflow), but we might want to retrigger a publish on a different commit manually, which is not the latest one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we still likely want the last author of a metadata change to be notified right?

Also for republishing a failed publish, typically that is done by manual github action right?

Or am I misunderstanding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry I was wrong.


def _safe_load_metadata_file(metadata_file_path: Path) -> dict:
try:
metadata = yaml.safe_load(metadata_file_path.read_text())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would deserialization to a metadata model be more robust in terms of validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason I didnt deserialize to a metadata model here is that we techincally dont want to do validation until the metadata file is considered done. i.e. has had all the transforms applied to it.

This lets us add mandatory generated fields easier in the future.

If thats too future optomized though let me know.

"""Publish a connector notification log to logger and slack (if enabled)."""
message = PublishConnectorLifecycle.create_log_message(lifecycle_stage, stage_status, message)
message = PublishConnectorLifecycle.create_log_message(lifecycle_stage, stage_status, message, commit_sha, user_identifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the slack bot also invite the committer to the #connector-publish-updates channel if they are not channel member yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look into that! Though it also may be useful to let users opt out of this feature 😅

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if I'm wrong but I believe an extra change to the MetadataUpload step in airbyte-ci is required to make this metadata enrichment work on publish: We have to mount the .git directory to the metadata container here so that the containerized metadata service lib can resolve the current git repo.

Please also:

  • Handle git.exc.InvalidGitRepositoryError when no repo is available + log a warning and
  • Use
commit = repo.commit(commit_sha) 
return GitInfo.from_commit(commit)

Copy link
Contributor Author

bnchrch commented Dec 14, 2023

Let me know if I'm wrong but I believe an extra change to the MetadataUpload step in airbyte-ci is required to make this metadata enrichment work on publish: We have to mount the .git directory to the metadata container

Yup, I think you are correct here, good catch.

@bnchrch
Copy link
Contributor Author

bnchrch commented May 2, 2024

@alafanechere This is back and ready for review!

As a reminder this

  1. Adds commit info to a registry entry
  2. Adds slack notifications for when your connector has published
  3. Adds metadata etag info to the registry entry so you can trace lineage (sorry for not originally making this a stack)

What this doesn't do is

  1. Add registry entry generation time (what we discussed for Connector Quality Metrics). That will be in another Stack

@bnchrch
Copy link
Contributor Author

bnchrch commented May 3, 2024

@alafanechere closing this in favor of a stacked version here

#37804

@bnchrch bnchrch closed this May 3, 2024
bnchrch added a commit that referenced this pull request May 9, 2024
## What
<!--
* Describe what the change is solving. Link all GitHub issues related to this change.
-->
Adds git commit info to the metadata file during upload.

![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/PTsI7qAmiIMkhFQg04QF/b7de4cce-ffe8-4506-a13d-027b1ba21a34.png)


Spun out of #32715 as a stack
bnchrch added a commit that referenced this pull request May 9, 2024
## What
Updates our slack lifecycle notifications to mention the author of the metadata change on slack

![image.png](https://graphite-user-uploaded-assets-prod.s3.amazonaws.com/PTsI7qAmiIMkhFQg04QF/b20cd2d2-dc18-4a15-ae0e-0f8a218cf871.png)


Spun out of #32715 as a stack
bnchrch added a commit that referenced this pull request May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants