Skip to content

Add pkg-precompile.jl script #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Apr 9, 2025
Merged

Add pkg-precompile.jl script #1

merged 61 commits into from
Apr 9, 2025

Conversation

omus
Copy link
Member

@omus omus commented Apr 4, 2025

Creates a script to assist with precompiling Julia packages within a build container. This script is specifically compatible with build cache mounts which allow .ji files to be shared between builds for decreased build times.

@omus
Copy link
Member Author

omus commented Apr 4, 2025

I'll be doing a one test of all patch releases of Julia from 1.10.0 - 1.10.9, 1.11.0 - 1.11.4 to ensure our use of internal code doesn't break with any of these versions.

@omus
Copy link
Member Author

omus commented Apr 4, 2025

Found a corner case I need to add tests for. In most cases the Project.toml may contain a name which then expects a src/$name.jl file to be present.

@omus omus marked this pull request as ready for review April 8, 2025 19:55
@omus omus requested a review from kleinschmidt April 8, 2025 19:55
@@ -0,0 +1,338 @@
#!/usr/bin/env -S julia --color=yes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just call this precompile.jl?

# julia-container-scripts
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe julia-container-build-scripts or julia-docker-build-scripts is a better name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is clear enough as far as I'm concerned.

Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I review the precompile script carefully and skimmed everything else. LGTM with some very minor comments/suggestions (I think just one typo?)

My main question is I guess a high level one: what does this give us over what we have been doing? pkg-precompile.jl is a pretty complex piece of code and hooks into what seem like internals of julia (although I haven't looked carefully at how internal they really are...). This is a public repo so I won't link to other internal stuff but basically this seems like it's replacing a pretty innocuous script that comes down to having one step like

# Install Julia package registries, instantiate the Julia project environment, and
# precompile Julia dependencies. We need to add the Julia registries in the same Docker
# layer which copies the Project.toml/Manifest.toml to ensure this step works in
# environments where we use Docker layer caching with a fresh mount cache.
COPY *Project.toml *Manifest.toml /tmp/
RUN --mount=type=cache,target=${JULIA_USER_DEPOT_CACHE},sharing=locked \
    --mount=type=secret,id=github-token \
    # Create the required project source file
    mkdir -p "${JULIA_PROJECT}/src" && \
    touch "${JULIA_PROJECT}/src/$(basename "${JULIA_PROJECT}").jl" && \
    # Move the Project.toml and Manifest.toml potentially into the depot cache
    mv /tmp/*Project.toml /tmp/*Manifest.toml "${JULIA_PROJECT}" && \
    julia --project="${JULIA_PROJECT}" -e ' \
        using Pkg; \
        # Only use `Pkg.Registry.add` when necessary has this always downloads the entire registry. \
        if isempty(Pkg.Registry.reachable_registries()) \
            Pkg.Registry.add([RegistrySpec(name="Beacon", url="https://github.com/beacon-biosignals/BeaconRegistry.git"), \
                              RegistrySpec(name="General")]); \
        else \
            Pkg.Registry.update(); \
        end; \
        Pkg.instantiate(); \
        Pkg.build(); \
        Pkg.precompile(strict=true, timing=true)'

we're not really taking advantage of the relocatable precompilation (since this still works on 1.10), and it seems like moving the precompilation and final stage into a single stage is the only significant simplification this makes for our dockerfiles. but maybe there's something I'm missing??

# julia-container-scripts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is clear enough as far as I'm concerned.

Comment on lines +26 to +27
# instantiating a named Julia project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.10.8-1.10 require this for precompilation too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I determined the versions which required this from using the test suite. We are testing for this in with the GEN_PKG_SRC Docker ARG which is only enabled on these Julia versions where it is required.

for ext in keys(manifest[pkg.uuid].exts)
# The extension UUID deterministic and based upon the parent UUID and the
# extension name. e.g. https://github.com/JuliaLang/julia/blob/2fd6db2e2b96057dbfa15ee651958e03ca5ce0d9/base/loading.jl#L1561
# Note: the `Base.uuid5` implementation differs from `UUIDs.uuid5`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:AHHHHH:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is downright dumb. We have tests for extensions so if this were to change the tests should pick up on that.

omus and others added 2 commits April 9, 2025 08:10
Co-authored-by: Dave Kleinschmidt <[email protected]>
Co-authored-by: Dave Kleinschmidt <[email protected]>
@omus
Copy link
Member Author

omus commented Apr 9, 2025

My main question is I guess a high level one: what does this give us over what we have been doing? pkg-precompile.jl is a pretty complex piece of code and hooks into what seem like internals of julia (although I haven't looked carefully at how internal they really are...). This is a public repo so I won't link to other internal stuff but basically this seems like it's replacing a pretty innocuous script that comes down to having one step like...

The Docker snippet you posted is the new revised instantiate step from the pattern we proposed in https://discourse.julialang.org/t/proposed-julia-docker-workflow-to-use-a-persistent-depot/97963. From the discussion we had with @ericphanson the issue with our Dockerfile pattern is that it requires multiple steps to function correctly and is complex enough that it can easily be broken by users extending the pattern to fit their own needs.

The primary advantage the new pkg-precompile.jl script is that it provides almost all of the original functionality but uses a single script to do so meaning that it's much harder for Dockerfile developers to modify a statement in their Dockerfile and break the caching behavior. Additionally, we now have a test suite for this which should provide us more confidence that this script is functioning correctly even though it does touch some Julia internals.

Some of the more recent changes like using fixed modification timestamps and gen-pkg-src.jl break the "single statement" approach. My response to that is that:

  1. gen-pkg-src.jl is needed before instantiate so this has to be a separate script. Forgetting to use this however results in an error on Julia versions that require it so there is almost a self-correcting behavior here.
  2. The fixed modification timestamp one-liner can be rolled into pkg-precompile.jl. I'm hesitant to always run the script with this enabled but we could make it opt-in with a flag to clean things up further.

we're not really taking advantage of the relocatable precompilation (since this still works on 1.10),

As we don't mainly move our Julia depot the only advantage the relocatable changes provide us in Julia 1.11 is that .ji files are no longer invalidated when the modification times of the package source files change. This scenario comes up for us each time the Project.toml/Manifest.toml changes and is the reason we need to used the fixed modification timestamp workaround.

it seems like moving the precompilation and final stage into a single stage is the only significant simplification this makes for our dockerfiles. but maybe there's something I'm missing??

The advantages brought by the pkg-precompile.jl script over the Dockerfile pattern we used before are:

  • Simplifies the approach to writing Julia Dockerfiles and requires less background knowledge to use successfully. By using a single Docker RUN statement we no longer require mysterious --mount=type=cache,sharing=locked,target=/tmp/julia-cache to be used on statements which look like they don't require them.
  • Modifying the active project to be distinct reduces .ji collisions. Having this in place makes it possible to use sharing=shared (concurrent read/write access multiple builds) and improves our cache hits.
  • Having the pkg-precompile.jl trigger the first initialization of the packages can remove an extra step from our Dockerfiles.
  • Checking that packages are precompiled now avoids common scenarios where packages where precompilation would accidentally occur at container runtime instead of build time.
  • The approach in pkg-precompile.jl copies only the precompilation files which are required into the image rather than relying on Pkg.gc which doesn't remove .ji files and leads to image size bloat.
  • The usage of pkg-precompile.jl is now inline with the typical usages of --mount=type=cache in that it's an optional performance enhancement.
  • The new script has a test suite which should ensure reliable behavior which was hard to do with the pattern.

The disadvantages are:

  • Uses Julia internals. I think we can address this by upstreaming some of the ideas here into Julia itself.
  • No long attempts to store the entire Julia depot in a cache. The main loss here is no longer caching the "packages" directory. We can look into reintroducing this later as possibly another script. I think letting users opt-in to what they want to cache is important so more thought is needed on how we would approach that now.

@omus
Copy link
Member Author

omus commented Apr 9, 2025

I've been looking at setting the fixed modification timestamp inside the Julia script to make things more convenient for users. At first I was worried that a Julia walkdir approach would be slower than find -exec but that turned out to not be true. However, in looking at image sizes there is a disadvantage to modifying the timestamps from within Julia when the instantiate and precompile steps are separate.

For example if we instantiate and use find -exec in the same RUN statement (layer) we produce a layer with 508MB and the precompilation layer is 159MB:

<missing>      26 seconds ago      RUN /bin/sh -c curl -fsSLO https://raw.githu…   159MB     buildkit.dockerfile.v0
<missing>      35 seconds ago      RUN /bin/sh -c julia --color=yes -e 'using P…   508MB     buildkit.dockerfile.v0

If we instead just instantiate in one layer and then set the fixed modification times and precompile in another layer we see the instantiate layer remain the same size but have precompile increase to 216MB.

<missing>      28 seconds ago      RUN /bin/sh -c curl -fsSLO https://raw.githu…   216MB     buildkit.dockerfile.v0
<missing>      38 seconds ago      RUN /bin/sh -c julia --color=yes -e 'using P…   508MB     buildkit.dockerfile.v0

The difference here is that the first layer records one set of timestamps while the later changes the timestamps in another layer.

@omus
Copy link
Member Author

omus commented Apr 9, 2025

Maybe we should provide a third script to do the fixed modification times? I'm considering making an all-in-one installer to simplify things further. Maybe we should just provide an instantiate.jl?

I'll keep things simple for now.

@omus omus requested a review from kleinschmidt April 9, 2025 16:20
Copy link
Member

@kleinschmidt kleinschmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think having a one-stop instantiate.jl script would be useful here. Otherwise we're back in a similar boat of needing to do some careful juggling of various non-trivial steps. But I'm okay with trying this out and seeing if the juice is worth the squeeze.

@omus omus merged commit d58f136 into main Apr 9, 2025
6 checks passed
@omus omus deleted the cv/initial branch April 9, 2025 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants