Skip to content

Commit 27d221f

Browse files
committed
dpuVendor: Build a multiarch ipu manifest
The dpu-operator by default expects a multiarch vsp image. In the past, this code just worked, because we would build and push the image immediately before deploying the operator on the respective architecture, so the most recent image would be the proper arch. This is flimsy, and can cause errors later on if we need to pull the image again. Instead, as a workaround while waiting for multiarch IPU vsp build support, build a manifest of the images during the second phase of deployment (iso-cluster dpu). This code makes a (potentially bad) assumption that the host side deployment has already run, and built/pushed the x86_64 vsp. Signed-off-by: Salvatore Daniele <[email protected]>
1 parent 0680697 commit 27d221f

File tree

2 files changed

+18
-2
lines changed

2 files changed

+18
-2
lines changed

dpuVendor.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,11 @@ def build_push(self, h: host.Host, imgReg: ImageRegistry) -> str:
7272
vsp_image = self.vsp_image_name(imgReg)
7373
h.run_or_die(f"podman tag intel-ipuplugin:latest {vsp_image}")
7474
h.run_or_die(f"podman push {vsp_image}")
75+
# WA to ensure multiarch vsp image manifest is available
76+
# push images with both the name expected by the dpu operator (so we can proceed with deploying host side)
77+
# and the name expected by the manifest that we will build during the IPU deployment step
78+
h.run_or_die(f"podman tag {vsp_image} {vsp_image}-{self.name_suffix}")
79+
h.run_or_die(f"podman push {vsp_image}-{self.name_suffix}")
7580
return vsp_image
7681

7782
def start(self, vsp_image: str, client: K8sClient) -> None:

extraConfigDpu.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -179,8 +179,19 @@ def ExtraConfigDpu(cc: ClustersConfig, cfg: ExtraConfigArgs, futures: dict[str,
179179
# Build on the ACC since an aarch based server is needed for the build
180180
# (the Dockerfile needs to be fixed to allow layered multi-arch build
181181
# by removing the calls to pip)
182-
vendor_plugin.build_push(acc, imgReg)
183-
# vendor_plugin.start(vendor_plugin.vsp_image_name(imgReg), client)
182+
vsp_img = vendor_plugin.build_push(acc, imgReg)
183+
184+
# As a workaround while waiting for properly multiarch build support, we can create a manifest to ensure both host and dpu can deploy the vsp with the same image.
185+
# Note that this makes the assumption that the host deployment has already been run and the latest ipu plugin image is already locally available in the registry.
186+
# Without these assumptions, this will not work as expected
187+
manifest = f"{vsp_img}-manifest"
188+
lh.run(f"buildah manifest rm {manifest}")
189+
lh.run_or_die(f"buildah manifest create {manifest}")
190+
lh.run_or_die(f"podman pull {vsp_img}-x86_64")
191+
lh.run_or_die(f"podman pull {vsp_img}-aarch64")
192+
lh.run_or_die(f"buildah manifest add {manifest} {vsp_img}-x86_64")
193+
lh.run_or_die(f"buildah manifest add {manifest} {vsp_img}-aarch64")
194+
lh.run_or_die(f"buildah manifest push --all {manifest} docker://{vsp_img}")
184195

185196
git_repo_setup(repo, repo_wipe=False, url=DPU_OPERATOR_REPO)
186197
if cfg.rebuild_dpu_operators_images:

0 commit comments

Comments
 (0)