Skip to content

Refactor Dockerfiles to better support ARM #1575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 14, 2021

Conversation

pquerna
Copy link
Contributor

@pquerna pquerna commented May 26, 2021

What changed?

  • Use ARG for base docker images for easier debug/test.
  • Separate base-builder and base-ci-builder. The later has some tools which are not available on ARM yet and required only by CI builds. base-builder has very minimum required to build production version and has full ARM support.
  • Replaced binary of jwilder/dockerize with build from source from a git clone. Worth looking at a replacement or removal, but this preserves the existing use of dockerize templates for now.
  • Add corresponding Makefile targets to build with docker buildx.
  • Update docker/base-images/README.md.
  • Partial "fix" for Feature: provide arm64 builds to Docker Hub #1305.

Build base-server for both arm64 and x86 container images with this command:

docker buildx build  \
    --build-arg TARGET=auto-setup \
    --platform linux/arm64,linux/amd64 \
    -f base-server.Dockerfile \
    .

or

make base-server-x DOCKER_IMAGE_TAG=1.3.0

Why?
Ability to run Temporal on AWS Graivton2 instances.

How did you test it?
Build locally for different platforms.

Potential risks
Might brake x86 docker images.

Is hotfix candidate?
No.

@pquerna pquerna requested a review from a team May 26, 2021 21:31
@CLAassistant
Copy link

CLAassistant commented May 26, 2021

CLA assistant check
All committers have signed the CLA.

@pquerna
Copy link
Contributor Author

pquerna commented May 26, 2021

Pushed a image for temporal-server and auto-setup to: https://gallery.ecr.aws/l2s6p1l5/ductone-temporal-multiarch

$ docker manifest inspect public.ecr.aws/l2s6p1l5/ductone-temporal-multiarch:server-latest
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2618,
         "digest": "sha256:5c2dfa51bbad1051dbde8079afec40aa5648dd4c6c3212bc7ba0b8394265fcfa",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2618,
         "digest": "sha256:6a7d4e2917cf0114ab2b871b8246e77e241c360e1169a2066d9015bdc968571d",
         "platform": {
            "architecture": "arm64",
            "os": "linux"
         }
      }
   ]
}

@wxing1292 wxing1292 requested review from alexshtin and underrun May 26, 2021 22:43
@alexshtin
Copy link
Member

The reason why we have those base images is because we want build to have as less "remote" dependencies as possible. We can't avoid docker hub but that should be it. All apk, git, curl, etc RUN commands should be in base image, built/installed once and reused. It also improve build time. The main Dockerfile should have only Temporal related stuff.

Can you fix it?

Copy link
Member

@alexshtin alexshtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pquerna pquerna changed the title Refactor Dockerfile into a single file with targets to support ARM Refactor Dockerfile to better support ARM May 27, 2021
@pquerna
Copy link
Contributor Author

pquerna commented May 27, 2021

@alexshtin The current structure in master makes it difficult for anyone without permission to push to temporalio's dockerhub account to actually iterate on the images, so I think I found a middle step: I changed the Dockerfile to take the base images as an ARG, with defaults to the current values. This makes it possible to build the base images locally, and then inject them into the main Dockerfile more easily.

Additionally, I made a new base-ci, which is like base-build, but with extra tools. The fossa-cli tool does not provide ARM support at this time. Additionally, the Alpine package for shellcheck is currently x86 only: https://pkgs.alpinelinux.org/package/edge/community/x86_64/shellcheck

This MR even if merged, will need someone with Dockerhub permissions to push new base images, and then make a separate PR with new versions.

Is this OK?

@pquerna pquerna force-pushed the pq/arm-docker branch 3 times, most recently from f75e77f to 1be9468 Compare May 27, 2021 17:45
@alexshtin
Copy link
Member

Thanks for fixing this, Paul. I like the approach. I want to make few renames though. Don't have time for it right now, but will get back by the end of the week.

@alexshtin
Copy link
Member

Paul, can you give me write permission to your fork? I don't want to open new PR and would like to contribute to this one. I have changes ready, just need to push them.

@pquerna
Copy link
Contributor Author

pquerna commented Jun 7, 2021

@alexshtin done

@alexshtin
Copy link
Member

alexshtin commented Jun 7, 2021

Tried to test it locally today and got error. Was able to scope it down to the Dockerfile:

FROM alpine:3.12 AS base-server
RUN apk add --update --no-cache bash

and build it with:

docker buildx build --platform linux/arm64 --output type=image .

and got:

 => [internal] load build definition from Dockerfile
 => => transferring dockerfile: 107B
 => [internal] load .dockerignore
 => => transferring context: 2B
 => [internal] load metadata for docker.io/library/alpine:3.12
 => CACHED [1/2] FROM docker.io/library/alpine:3.12@sha256:36553b10a4947067b9fbb7d532951066293a68eae893beba1d9235f7d11a20ad
 => => resolve docker.io/library/alpine:3.12@sha256:36553b10a4947067b9fbb7d532951066293a68eae893beba1d9235f7d11a20ad
 => ERROR [2/2] RUN apk add --update --no-cache bash
------
 > [2/2] RUN apk add --update --no-cache bash:
#5 0.259 fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/aarch64/APKINDEX.tar.gz
#5 1.306 fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/aarch64/APKINDEX.tar.gz
#5 3.693 (1/4) Installing ncurses-terminfo-base (6.2_p20200523-r0)
#5 3.742 (2/4) Installing ncurses-libs (6.2_p20200523-r0)
#5 4.198 (3/4) Installing readline (8.0.4-r0)
#5 4.332 (4/4) Installing bash (5.0.17-r0)
#5 5.156 Executing bash-5.0.17-r0.post-install
#5 5.165 ERROR: bash-5.0.17-r0.post-install: script exited with error 1
#5 5.169 Executing busybox-1.31.1-r20.trigger
#5 5.174 ERROR: busybox-1.31.1-r20.trigger: script exited with error 1
#5 5.182 1 error; 7 MiB in 18 packages
------
Dockerfile:3
--------------------
   1 |     FROM alpine:3.12 AS base-server
   2 |
   3 | >>> RUN apk add --update --no-cache bash
   4 |
--------------------
error: failed to solve: rpc error: code = Unknown desc = executor failed running [/dev/.buildkit_qemu_emulator /bin/sh -c apk add --update --no-cache bash]: exit code: 1

Just linux/arm works fine but not linux/arm64. Am I doing something wrong or apk has broken version of bash package for arm64? Do you know how to get a better error message?

I am on linux/amd64 and building using docker-container driver.

@pquerna
Copy link
Contributor Author

pquerna commented Jun 7, 2021

The example Dockerfile you showed works for me:

$ docker buildx build --platform linux/arm64 --output type=image .
[+] Building 2.9s (6/6) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                  0.0s
 => => transferring dockerfile: 106B                                                                                                                                  0.0s
 => [internal] load .dockerignore                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                       0.0s
 => [internal] load metadata for docker.io/library/alpine:3.12                                                                                                        0.9s
 => CACHED [1/2] FROM docker.io/library/alpine:3.12@sha256:36553b10a4947067b9fbb7d532951066293a68eae893beba1d9235f7d11a20ad                                           0.0s
 => => resolve docker.io/library/alpine:3.12@sha256:36553b10a4947067b9fbb7d532951066293a68eae893beba1d9235f7d11a20ad                                                  0.0s
 => [2/2] RUN apk add --update --no-cache bash                                                                                                                        1.2s
 => exporting to image                                                                                                                                                0.1s
 => => exporting layers                                                                                                                                               0.1s
 => => exporting manifest sha256:0cae145dd19ec180c690ca401bbb827792716cda6dd62257c6cd464a592ac4ad                                                                     0.0s
 => => exporting config sha256:dbb1e23854b1e2dbcd10021f4fcff8e009f6222483996169d9a8ab7759bc7cc8                                                                       0.0s

I'd guess its a problem with your arm64 builder for buildx -- maybe try this docker run to see if your qemu is working right:

$ docker run --rm -ti --platform linux/arm64 alpine:3.12 uname -m
aarch64

@alexshtin
Copy link
Member

I am not able to run this on my linux/amd64 machine:

standard_init_linux.go:228: exec user process caused: exec format error

which totally makes sense for me.

@pquerna
Copy link
Contributor Author

pquerna commented Jun 8, 2021

I am not able to run this on my linux/amd64 machine:

standard_init_linux.go:228: exec user process caused: exec format error

which totally makes sense for me.

If you install qemu, eg qemu-user-static in debian/ubuntu, it should work:
https://gist.github.com/Manu343726/ca0ceb224ea789415387
https://matchboxdorry.gitbooks.io/matchboxblog/content/blogs/build_and_run_arm_images.html

@alexshtin
Copy link
Member

alexshtin commented Jun 12, 2021

Apparently there was some problems with my docker. I cleaned up few things and it started to work.
Btw, thanks for qemu-user-static. I can run arm64 images now!

Copy link
Contributor

@underrun underrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a plan for testing arm builds in buildkite? It would be great if we didn't have to test this manually.

@alexshtin
Copy link
Member

alexshtin commented Jun 13, 2021

We don't have any automation for base images right now. Buildkite, probably, is not a best tool for this. Our CI needs to be updated to support ARM builds. This is a next step though. For now I just tested it manually.

@alexshtin alexshtin changed the title Refactor Dockerfile to better support ARM Refactor Dockerfiles to better support ARM Jun 14, 2021
@alexshtin alexshtin merged commit 9fc20ca into temporalio:master Jun 14, 2021
@alexshtin alexshtin deleted the pq/arm-docker branch June 14, 2021 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants