Skip to content

Separate connectors /publish workflow into a build candidate and promotion to production GitOps pipeline pattern #12108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
noahkawasaki-airbyte opened this issue Apr 18, 2022 · 13 comments
Labels
area/connectors Connector related issues build team/extensibility type/enhancement New feature or request

Comments

@noahkawasaki-airbyte
Copy link
Contributor

noahkawasaki-airbyte commented Apr 18, 2022

Tell us about the problem you're trying to solve

Currently, the automation for developing a connector involves running the /publish command which builds a docker image based on a branch/PR, doing various version tagging and bumping, and then pushing this image to Airbyte's Dockerhub as an official production ready connector for customers. This can/does occur while the PR is in flight (AKA, not merged).

There are some problem scenarios with this approach:

  1. If the PR is not reviewed and approved, it means this workflow is capable of publishing an image that is not vetted and agreed to be merged into an Airbyte connector
  2. If multiple PRs are working on the same connector, there is a chance that two publish jobs running in a similar window of time, but not having had rebasings/merges with master if one PR is merged can lead to another dev building that same connector image without the merged PRs changes, thereby overwriting/excluding work in another in flight PR
  3. If I run publish on a PR and an image is published, and then decide to add more changes, the pipeline breaks with a message saying the version I am trying to build already exists. It means a branch can only ever try to publish one time rather than on each push event. So I need to be very sure that my PR is going to be approved before publishing

Describe the solution you’d like

I would like airbyte connector image publishing to follow a more GitOps-like approach. The idea is to decompose the publish workflow into two different workflows with more focused purposes.

Assume we are working with source-postgres:0.3.18 and trying to bump to source-postgres:0.3.19

Workflow 1: bump-build-test-connector

Main responsibility: Handle bumping of version in all necessary files, build an image to test, run tests

  1. If auto-bump-version=true bump the version (version in Master) +1 in Dockerfile, processResources (which updates definitions.yaml and seed.yaml, I think?)
  2. Build image for connector
  3. Tag image for connector following this pattern: source-postgres:0.3.19-candidate-${PR_NUMBER}
  4. Run tests with connector image from step 3
    If success^
  5. Set a github label on PR such as publish/source-postgres which will be a signal to Workflow 2 to execute

Workflow 2: publish-connector

Prerequisites: Workflow above succeeded, and PR has been reviewed, approved, merged, and contains the signal label set in step 5 above. This means the image and code is shippable.

Main responsibility: Re-tag source-postgres:0.3.19-candidate-${PR_NUMBER} as source-postgres:0.3.19, AKA the tagging pattern that Airbyte will use for deployment

  1. Create certain if conditions to make sure this only triggers on criteria in step 5 of workflow 1:

https://github.community/t/trigger-workflow-only-on-pull-request-merge/17359

# Triggering scenario
on:
  pull_request:
    types: [closed]

# Wrap if condition here around everything
jobs:
  <job id>:
    if: github.event.pull_request.merged == true && contains( github.event.pull_request.labels.*.name, 'publish/connector') 
    steps: // the rest of the steps
  1. Very easy, take image from above and re-tag

Timeline of Workflows

The timeline in which stuff happens becomes:

  1. Put up PR with changes
  2. bump-build-test-connector runs if triggered by comment
  3. Reviews, iteration, etc. Can happen multiple times.

Steps 2-3 happen as many times as necessary

  1. Reviewed, PR approved
  2. PR is merged
  3. publish-connector runs

Describe the alternative you’ve considered or used

Alternative ideas include:

Decoupling the Airbyte Connector tagging from PRs entirely. Have a periodic cadence, such as a week or 2, where some release job runs and checks for all changed connectors and bumps versions if any changes have merged. This would mean every two weeks if anythings been merged a new connector version would be created.

Pros: Simplifies pipeline significantly, since devs never need to worry about bumping certain versions. You just merge your code knowing the next time the release pipeline runs it will be in that version. Fully automated. Never need to worry about other in flight PRs on same connector
Cons: Might require manual changelog reading and writing, as someone would need to check through merged changes and write up relevant ones. Also would still occasionally require a manual release cut if a critical bug needed to be fixed and released immediately.

Current system: manual publishing and having to keep track of other potential conflicting changes.

Additional context

Found this because it kind of means I cannot merge this #11954 (review) until this merges: https://github.com/airbytehq/airbyte/pull/11877/files

So this scenario kind of creates this dependency on each other

Are you willing to submit a PR?

Yes

@noahkawasaki-airbyte noahkawasaki-airbyte changed the title Separate the docker pushing of a connector out from publish and into a post-merge action to better follow GitOps Separate the docker pushing of a connector out from /publish and into a post-merge action to better follow GitOps Apr 18, 2022
@noahkawasaki-airbyte noahkawasaki-airbyte changed the title Separate the docker pushing of a connector out from /publish and into a post-merge action to better follow GitOps Separate connectors /publish workflow into a build candidate and promotion to production GitOps pipeline pattern Apr 18, 2022
@noahkawasaki-airbyte
Copy link
Contributor Author

Another scenario where a post-merge promotion would have been nice: #10256 (comment)

@edgao edgao pinned this issue Apr 26, 2022
@edgao edgao unpinned this issue Apr 26, 2022
@tuliren
Copy link
Contributor

tuliren commented Apr 26, 2022

I really like the candidate image idea.

I have one comment. Based on timeline 1-3:

  1. Put up PR with changes, version bumping has been run and a commit has been added into the PR, candidate image is built and pushed
  2. Reviews, iteration, etc. Can happen multiple times.
  3. Put up changes with feedback, candidate image is built and pushed with new changes

The candidate image can be built and pushed multiple times in a PR. Does the automatic image building and publishing happen for every commit in the PR? This can be quite wasteful in terms of github action quota and resource I think.


I can think of two alternative proposals:

  • Keep everything fully automated. Build and publish the production image after a PR is merged.
  • Introduce a manual step.
    • Build and publish the candidate image only after a PR is merged.
      • If a user needs the candidate version, they can still get it before it is retagged.
    • Periodically aggregate all the candidate images, and retag them to go live in production.
      • This manual step also gives us opportunity to modify and improve the change logs.

@noahkawasaki-airbyte noahkawasaki-airbyte self-assigned this Apr 27, 2022
@noahkawasaki-airbyte
Copy link
Contributor Author

The candidate image can be built and pushed multiple times in a PR. Does the automatic image building and publishing happen for every commit in the PR? This can be quite wasteful in terms of github action quota and resource I think.

I was thinking since it still will be triggered by a manual /publish call, this shouldn't be an issue. Wont automatically build images on every push event (though we could look into that, fully automated would be nice). This shouldn't increase the build frequency at all actually, its just changing the tagging strategy for the most part.

Keep everything fully automated. Build and publish the production image after a PR is merged.

Technically that would result in us building the same image twice, and pushing a non tested image to prod. Idea is that the built candidate image is the same one that was tested, we're just retagging to be usable by deployments

Periodically aggregate all the candidate images, and retag them to go live in production.

Period strategy would be nice to reduce the amount of images being created on every PR, but I dont know how we would have an efficient/automated way to handle changelog notes for all of the changes merged in since last release. Since we dont use github releases on airbyte connectors, we would not be able to do something like git diff release-X..release-Y. Someone would have to read through all PRs for each connector every week and write a changelog

@noahkawasaki-airbyte
Copy link
Contributor Author

noahkawasaki-airbyte commented Apr 27, 2022

I'm actually going to start implementing a prototype that can be demo'd for this because I think its a worthwhile thing to get done as soon as possible to reduce some pain. Open to iterate as feedback comes up on certain ideas and if I discover certain weaknesses to this approach, starting to try to implement is the only way to see what other blockers might need to be addressed.

EDIT: I've cleaned up the issue description and added more details based on some new ideas

@grishick
Copy link
Contributor

@noahkawasaki-airbyte FYI, I filed an issue for similar context a month ago: https://github.com/airbytehq/airbyte-internal-issues/issues/509 and @evantahler created an Epic for similar issues: #11937

@evantahler
Copy link
Contributor

Feel free to close #11937 in favor of yours!

@noahkawasaki-airbyte
Copy link
Contributor Author

This is actually just a sub task of that epic :) thanks for filing that^ Ill contribute

@noahkawasaki-airbyte
Copy link
Contributor Author

Initial PR to have workflows created in GHA: #12405

@Phlair
Copy link
Contributor

Phlair commented Apr 29, 2022

@noahkawasaki-airbyte this is really great thinking, massive +1 to this.

I can see how this helps us avoid user error / visibility issues and avoid the problem statements 1 & 3.

I'm unsure how this solves problem statement 2 though, since the publish is retagging the docker image that was built in the PR. e.g.

  • Two PRs changing source-postgres and running the new build workflow
  • If I understand correctly, we'd get two images that don't take into account the changes from the other:
    • source-postgres:0.3.19-candidate-12004
    • source-postgres:0.3.19-candidate-12018
  • First gets merged and so source-postgres:0.3.19-candidate-12004 becomes source-postgres:0.3.19
  • Second gets merged... what happens?
    • Do we overwrite source-postgres:0.3.19 with source-postgres:0.3.19-candidate-12018?
    • Does source-postgres:0.3.19-candidate-12018 become source-postgres:0.3.20 and we lose the changes from source-postgres:0.3.19-candidate-12004?
    • Something else?

Unless I'm missing something, I think we'd need to build from master at the point of 2nd workflow to ensure we're getting any changes that were recently merged and not accounted for when we built the candidate image.

  • One potential new issue this could create is having workflow 1 run successfully in the PR, but then publish failing afterwards because another change was just merged and published before and now this change is incompatible. Now we've got code in master that didn't publish and is unable to without changes.

@noahkawasaki-airbyte
Copy link
Contributor Author

noahkawasaki-airbyte commented Apr 29, 2022

Yeah, I've been rethinking the retag strategy vs building from new. But either way, if the scenario you said ever happened, in my mind the outcome would be:

PR 1 source-postgres:0.3.19-candidate-12004 merges and is retagged to source-postgres:0.3.19
PR 2 source-postgres:0.3.19-candidate-12018 should actually get a merge conflict and not be able to merge until it is rebased/merged with master and updated (Need to test this theory). When it is rebased/merged and picks up new changes, the autobumper will now result in source-postgres:0.3.20-candidate-12018.

I need to figure out how to enforce that happens in the code still.

On your last new issue comment. If workflow 1 ran successfully and another PR merged (hence would cause publish workflow after to fail) I think that would also be blocked by a merge conflict? Again, I need to confirm my theory on merge conflicts blocking some of these scenarios from happening. Based on my experience working on a connector as the same time as someone else, I always got merge conflicts in the Dockerfile, definitions file, and seed specs and was forced to make updates in order to merge

@Phlair
Copy link
Contributor

Phlair commented May 2, 2022

@noahkawasaki-airbyte
Aha yes ofc, merge conflicts should block any of the problems I was thinking about. I think we can only guarantee merge conflicts on the files that get auto-updated rather than manually changed so as long as that is happening always before PR is merged (kind of like publish now) then we should be good.

@noahkawasaki-airbyte
Copy link
Contributor Author

noahkawasaki-airbyte commented May 5, 2022

Now that I've actually worked on this I've learned that the publishing aspect of this is a little more complicated than I thought, the build tag and test stuff is straightforward though. Would like some thoughts on some ideas for Workflow 2: publish-connector AKA the step that handles the building and pushing of a connector based on the pre-req that a PR has been merged.

Option 1: Use a post-merge Github Action to build and publish connectors (to DockerHub).

Overview: After you merge your PR with say changes to source-X and source-Y, publish-connectors runs and builds these connectors off of master and publishes.

Why its nice:

  • PR Author doesnt need to know anything about this, it'll just happen.
  • A PR > Release 1:1 relationship retains the current changelog structure of one changelog per PR/version and is easy to manage
  • Currently its up to the PR author to decide whether a PR warrants a new release. Some changes in a connectors code-base could just be refactors and not change any behavior, the author can make the call to not do any unnecessary release based on this scenario. An author can also know to publish a new version of a connector if they are changing code outside of that connectors base directory that still affect that connector (AbstractJdbcSource.java for example) and make the conscious decision to still test and publish a new version.

Challenges:

  • Since the build-connector workflow runs at some point in time before the publish-connector action, how does the post-merge action know whether/what connectors to build after merge? Currently I've set the build-connector action to set the label publish-connector-on-merge on a PR when a bump to any connector happens, which is the requirement to trigger the post-merge workflow. But this does not help the post-merge workflow know which connectors to publish. My current idea is that the post-merge can auto-detect for Dockerfile version changes between the merge-base SHA and the HEAD of the PR branch SHA, and create a list of changed connectors which will be saved to a github action step output variable, and then run the publishing commands for each of these in a GHA matrix. Feels like it would be reliable enough. If something goes wrong, you can always run a manual publish workflow.

Option 2: Make the releasing of connectors a periodic cadence (weekly) that builds connectors each week if there are any changes.

Why its nice:

  • No engineer has to worry about anything related to connector releasing. They just merge their changes knowing that on say, Sunday, a release will build for that connector automatically and be live for use that week. For emergencies (p0) where a connector needs a certain fix right away we can still have a manual publish option (use existing) available.

Challenges:

  • Changelog recording for the connectors will get messy. Currently it's human managed, the PR author chooses the changelog title and adds it to docs. With a periodic change, we would either need someone to manually go through every PR for each connector to summarize relevant changes on each cut, or use a git log parsing strategy (filtering on changes to the connectors base dir?) to automatically list the PRs that affected a connector. The issue here is that it's not always black and white what changes should be included in the changelog, especially if a change outside of the connectors base directory affects that connector. Suppose AbstractJdbcSource.java was changed, a simple git log changelog strategy would not catch this, we would need to come up with many dependencies to accurately capture every change for each connector. Lots of work. Its much more reliable to have each PR author make the conscious decision on their PR about the changelog addition.
  • Still would require a way to auto-detect reliably whether or not a new connector version needs to be published. We wouldnt want the connectors all having new releases every week, only the changed ones. But as stated previously, there can be code changes outside of certain directories that can merit a new release to be required.

Based on these thoughts, I think idea 1 is preferrable. I trust that PR authors can make the good judgement call on releasing a new version better than any script anyone can write to try to reliably auto-detect when to cut a new version of something.

@sherifnada sherifnada unpinned this issue Aug 5, 2022
@evantahler
Copy link
Contributor

Closing in favor of #17189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues build team/extensibility type/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants