Skip to content

[feature] migrate images to ghcr #2455

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 16, 2025

Conversation

mahdikhashan
Copy link
Member

@mahdikhashan mahdikhashan commented Feb 27, 2025

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2446

ref: kubeflow/manifests#3010 (comment)

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: mahdikhashan <[email protected]>
@mahdikhashan mahdikhashan force-pushed the migrate-images-to-ghcr branch from 9699402 to 900ca0a Compare February 27, 2025 12:18
@mahdikhashan
Copy link
Member Author

/cc @kubeflow/wg-automl-leads @Electronic-Waste @varodrig PTAL!

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating this @mahdikhashan!
Should we do the same for the Training Operator v1.9 ?

@@ -45,7 +46,7 @@ jobs:
id: publish
uses: ./.github/workflows/template-publish-image
with:
image: docker.io/kubeflow/${{ matrix.component-name }}
image: ghcr.io/kubeflow/${{ matrix.component-name }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GHCR support additional paths so we can distinguish images by Kubeflow Project name:

docker pull ghcr.io/kubeflow/trainer/trainer-controller-manager

https://github.com/orgs/kubeflow/packages/container/package/training%2Ftraining-operator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you are right - done.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this.
Could we check if uploading time does does not exceed 10min?

https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry

For example images (mostly releae-1.9 branch), I'm wondering if we take 10 or more time to upload images.

Signed-off-by: mahdikhashan <[email protected]>
@mahdikhashan mahdikhashan force-pushed the migrate-images-to-ghcr branch from fafd8d6 to e151070 Compare February 28, 2025 19:13
@mahdikhashan

This comment was marked as outdated.

@mahdikhashan

This comment was marked as outdated.

Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahdikhashan Thanks for this! We may also need to update the image under manifests/ dir:

image: kubeflow/trainer-controller-manager

# Update the Kubeflow Trainer controller manager image tag.
images:
- name: kubeflow/trainer-controller-manager
newTag: latest

Not sure if there are other image definitions elsewhere.

@varodrig
Copy link
Contributor

varodrig commented Mar 3, 2025

@mahdikhashan thank you for working on this.

@varodrig
Copy link
Contributor

varodrig commented Mar 3, 2025

! We may also need to update the image under manifests/ dir:

@mahdikhashan keep us posted on this updates to approve the PR. thank you

@varodrig
Copy link
Contributor

varodrig commented Mar 3, 2025

/hold waiting for updates

@andreyvelich
Copy link
Member

We discussed with @thesuperzapper whether we should push images to both DockerHub and GHCR.
Maybe we can consider it but by default we will use the GHCR image ?

@andreyvelich
Copy link
Member

@mahdikhashan We also need to migrate Training Operator 1.9 images to GHCR, do you have time to help us with it as well?

@tenzen-y
Copy link
Member

tenzen-y commented Mar 4, 2025

We discussed with @thesuperzapper whether we should push images to both DockerHub and GHCR. Maybe we can consider it but by default we will use the GHCR image ?

+1 on DockerHub and GHCR since we can increase Googlability.

@mahdikhashan
Copy link
Member Author

mahdikhashan commented Mar 5, 2025

thank you all for your time, i'll free up some time and manage this pr.

Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in this PR, @mahdikhashan please can we continue to publish images to GHCR and DockerHub, but use GHCR in the default manifests: #2491

@mahdikhashan
Copy link
Member Author

mahdikhashan commented Mar 13, 2025

As we discussed in this PR, @mahdikhashan please can we continue to publish images to GHCR and DockerHub, but use GHCR in the default manifests: #2491

I'll dedicate time today and finalize it. sorry for the delay.

@mahdikhashan
Copy link
Member Author

@mahdikhashan We also need to migrate Training Operator 1.9 images to GHCR, do you have time to help us with it as well?

it is addressed here: #2491

@mahdikhashan
Copy link
Member Author

Thank you for this. Could we check if uploading time does does not exceed 10min?

https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry

For example images (mostly releae-1.9 branch), I'm wondering if we take 10 or more time to upload images.

I'll check it on my own user-account and share the result here.

@mahdikhashan
Copy link
Member Author

@tenzen-y

please check the following run:

https://github.com/mahdikhashan/trainer/actions/runs/13882367194/job/38842581119#step:4:1315

the build takes almost 30 min, but the push to ghcr is around 11 seconds.

Signed-off-by: mahdikhashan <[email protected]>
@google-oss-prow google-oss-prow bot added size/M and removed size/S labels Mar 16, 2025
@mahdikhashan
Copy link
Member Author

As we discussed in this PR, @mahdikhashan please can we continue to publish images to GHCR and DockerHub, but use GHCR in the default manifests: #2491

done. please take a look: https://github.com/mahdikhashan/trainer/actions/runs/13884081580

@mahdikhashan
Copy link
Member Author

requested changes addressed.

/cc @kubeflow/wg-automl-leads @Electronic-Waste @varodrig PTAL!

@tenzen-y
Copy link
Member

@tenzen-y

please check the following run:

https://github.com/mahdikhashan/trainer/actions/runs/13882367194/job/38842581119#step:4:1315

the build takes almost 30 min, but the push to ghcr is around 11 seconds.

Sounds great, thanks

@tenzen-y
Copy link
Member

@mahdikhashan We need to fix E2E testings so that we can use ghcr imegin inside KinD cluster.
Or, we can use dockerhub image for E2E.

Failed to pull image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to pull and unpack image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to resolve reference "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3Akubeflow%2Ftrainer%2Ftrainer-controller-manager%3Apull&service=ghcr.io: 403 Forbidden

@mahdikhashan
Copy link
Member Author

@mahdikhashan We need to fix E2E testings so that we can use ghcr imegin inside KinD cluster. Or, we can use dockerhub image for E2E.

Failed to pull image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to pull and unpack image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to resolve reference "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3Akubeflow%2Ftrainer%2Ftrainer-controller-manager%3Apull&service=ghcr.io: 403 Forbidden

the first option is not possible since there are no images on ghcr, for dockerhub it will become a problem due to pull limit by 1st of April. I'd suggest we fix them in another pr. open to your suggestion though.

@tenzen-y
Copy link
Member

@mahdikhashan We need to fix E2E testings so that we can use ghcr imegin inside KinD cluster. Or, we can use dockerhub image for E2E.

Failed to pull image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to pull and unpack image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to resolve reference "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3Akubeflow%2Ftrainer%2Ftrainer-controller-manager%3Apull&service=ghcr.io: 403 Forbidden

the first option is not possible since there are no images on ghcr, for dockerhub it will become a problem due to pull limit by 1st of April. I'd suggest we fix them in another pr. open to your suggestion though.

In that case, we can keep using the dockerhub image for manifests in this PR and focus only on gh actions change.
In the next PR, we can replace dockerhub image with ghcr one for manifests.

@mahdikhashan
Copy link
Member Author

@mahdikhashan We need to fix E2E testings so that we can use ghcr imegin inside KinD cluster. Or, we can use dockerhub image for E2E.

Failed to pull image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to pull and unpack image "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to resolve reference "ghcr.io/kubeflow/trainer/trainer-controller-manager:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://ghcr.io/token?scope=repository%3Akubeflow%2Ftrainer%2Ftrainer-controller-manager%3Apull&service=ghcr.io: 403 Forbidden

the first option is not possible since there are no images on ghcr, for dockerhub it will become a problem due to pull limit by 1st of April. I'd suggest we fix them in another pr. open to your suggestion though.

In that case, we can keep using the dockerhub image for manifests in this PR and focus only on gh actions change. In the next PR, we can replace dockerhub image with ghcr one for manifests.

ok, i'll revert changes for manifest here and open an accompanying pr for manifests.

Signed-off-by: mahdikhashan <[email protected]>
@mahdikhashan
Copy link
Member Author

@tenzen-y ptal #2529

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you
/lgtm
/approve

context: ${{ matrix.context }}
push: false

build-and-publish-to-dockerhub:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this and GHCR can consolidate into one.

- name: Checkout
- name: GHCR Login
- name: Docker Hub Login
- name: Publish Component ${{ matrix.component-name }}
- name: Test Build For Component ${{ matrix.component-name }}

Because images can be specified multiple name: https://github.com/docker/metadata-action?tab=readme-ov-file#inputs

@mahdikhashan could you open an issue or work on this improvement?

Copy link
Member Author

@mahdikhashan mahdikhashan Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, i'm more leaned toward current change since it separates path for publishing and has less mental effort for contribution. final decision is yours, i can merge/consolidate them in this pr.

Copy link
Member

@tenzen-y tenzen-y Mar 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, i'm more leaned toward current change since it separates path for publishing and has less mental effort for contribution. final decision is yours, i can merge them in this pr.

I do not want to block this PR by this my comment. So, I'm happy if you can work on my additional suggestion in the next PR, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me just know if you can work on it or just open an issue.

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tenzen-y

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 1adef36 into kubeflow:master Mar 16, 2025
19 checks passed
Comment on lines +98 to +100
registry: docker.io
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be fixed @mahdikhashan:
DOCKERHUB_USERNAME

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll open a hotfix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned fixess in follow-up PR as well

mahdikhashan added a commit to mahdikhashan/trainer that referenced this pull request Mar 16, 2025
* migrate

Signed-off-by: mahdikhashan <[email protected]>

* update path

Signed-off-by: mahdikhashan <[email protected]>

* pull manifest from ghcr

Signed-off-by: mahdikhashan <[email protected]>

* pull image from ghcr

Signed-off-by: mahdikhashan <[email protected]>

* add dockerhub publish as a job

Signed-off-by: mahdikhashan <[email protected]>

* update images and runner

Signed-off-by: mahdikhashan <[email protected]>

* revert: manifest changes

Signed-off-by: mahdikhashan <[email protected]>

---------

Signed-off-by: mahdikhashan <[email protected]>
Signed-off-by: Mahdi Khashan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate images in Dockerhub to GHCR
5 participants