Skip to content

Update Rocm base image for release 2025a #956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2025

Conversation

dibryant
Copy link
Contributor

@dibryant dibryant commented Mar 17, 2025

FIxes for https://issues.redhat.com/browse/RHOAIENG-19482

Description

Update ROCm base image

How Has This Been Tested?

podman build -t (tagname) -f jupyter/rocm/pytorch/ubi9-python-3.11/Dockerfile.rocm .

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Member

@atheo89 atheo89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Diamond! Added a few comments

@openshift-ci openshift-ci bot added size/xs and removed size/m labels Mar 19, 2025
@dibryant dibryant changed the title WIP Update Rocm packages for release 2025a Update Rocm packages for release 2025a Mar 19, 2025
Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the images:

quay.io/opendatahub/workbench-images:rocm-jupyter-tensorflow-ubi9-python-3.11-pr-956				
quay.io/opendatahub/workbench-images:rocm-jupyter-pytorch-ubi9-python-3.11-pr-956
  • Work with rocminfo and rocm-smi : ✔️
    Screenshot from 2025-03-19 16-40-04

  • Tensorflow can connect to AMD GPU: ✔️

Screenshot from 2025-03-19 17-02-55

  • Pytorch can connect to AMD GPU: ✔️
    Screenshot from 2025-03-19 17-02-44

  • Execute module avail for finding the rocm version : ✔️

(app-root) module avail
---------------------------------------------------------------------------------------------- /usr/share/Modules/modulefiles ----------------------------------------------------------------------------------------------
dot  module-git  module-info  modules  null  rocm/6.2.4  use.own  

All looks great, we just need to make the same change in

  • jupyter/minimal/Dockerfile.rocm

@openshift-ci openshift-ci bot added size/m and removed size/m labels Mar 19, 2025
@dibryant dibryant changed the title Update Rocm packages for release 2025a Update Rocm base image for release 2025a Mar 19, 2025
@openshift-ci openshift-ci bot added size/m and removed size/m labels Mar 19, 2025
Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Tested the minimal Rocm
quay.io/opendatahub/workbench-images:rocm-jupyter-minimal-ubi9-python-3.11-pr-956

version, rocm-smi, rocminfo : ✔️

@openshift-ci openshift-ci bot added the lgtm label Mar 19, 2025
Copy link
Contributor

openshift-ci bot commented Mar 19, 2025

@dibryant: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebook-cuda-jupyter-pt-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test notebook-cuda-jupyter-pt-ubi9-python-3-11-pr-image-mirror
ci/prow/runtime-ds-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test runtime-ds-ubi9-python-3-11-pr-image-mirror
ci/prow/notebook-jupyter-tai-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test notebook-jupyter-tai-ubi9-python-3-11-pr-image-mirror
ci/prow/runtime-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test runtime-ubi9-python-3-11-pr-image-mirror
ci/prow/notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test notebook-cuda-jupyter-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/runtime-cuda-pt-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test runtime-cuda-pt-ubi9-python-3-11-pr-image-mirror
ci/prow/codeserver-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test codeserver-ubi9-python-3-11-pr-image-mirror
ci/prow/runtime-cuda-tf-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test runtime-cuda-tf-ubi9-python-3-11-pr-image-mirror
ci/prow/notebook-jupyter-ds-ubi9-python-3-11-pr-image-mirror 3a9565a link true /test notebook-jupyter-ds-ubi9-python-3-11-pr-image-mirror
ci/prow/codeserver-notebook-e2e-tests 3a9565a link true /test codeserver-notebook-e2e-tests
ci/prow/runtimes-ubi9-e2e-tests 3a9565a link true /test runtimes-ubi9-e2e-tests
ci/prow/notebook-cuda-jupyter-ubi9-python-3-11-pr-image-mirror 0d262e2 link true /test notebook-cuda-jupyter-ubi9-python-3-11-pr-image-mirror
ci/prow/rocm-runtimes-ubi9-e2e-tests 0d262e2 link true /test rocm-runtimes-ubi9-e2e-tests
ci/prow/rocm-notebooks-e2e-tests 0d262e2 link true /test rocm-notebooks-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jiridanek
Copy link
Member

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented Mar 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiridanek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jiridanek jiridanek merged commit 9a709de into opendatahub-io:main Mar 20, 2025
23 of 42 checks passed
@dibryant dibryant deleted the rocm-update branch March 20, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants