Using Australia's Gadi HPC #4584
navidcy
started this conversation in
High performance computing
Replies: 1 comment 4 replies
-
This is a great resource @navidcy ! Even though I don't have access to these particular servers, this walk through will help me in setting things up on other servers. I may think about doing something similar for a cluster we have in Canada that a few research groups are using to run Oceanangigans, and I suspect more will follow. One question. Your |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Australia's Gadi supercomputer is housed at the National Computational Infrastructure within the Australian National University's campus.
Gadi has 160 nodes each containing four Nvidia V100 GPUs and two 24-core Intel Xeon Scalable 'Cascade Lake' processors. Also it has 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs per node.
Gadi uses a Portable Batch System (otherwise simply know as PBS) queuing system.
[Note, this post is subject to change. Let's try to keep it up to date, please comment below if something does not work.]
Scope
This discussion can cover anything to do with trying to get results from running Oceananigans on Gadi --- including installing Julia, setting up CUDA and MPI, configuring PBS scripts, and using other Julia packages in conjunction with Oceananigans.
Links
Getting started on Gadi
It's assumed as prerequisite that you have access to Gadi and an NCI username.
The first task is to download Julia. We suggest to juliaup to install julia in one of your project's directories.
Note: Avoid installing in your home (
$HOME
or~/
) directory since there is a 10GB limit of user's home directory and that can fill up quickly!Thus, to install julia
1.10.9
using juliaup first create a directory to install juliaup and julia. For example, if the NCI project you are part of isxy12
and your NCI username isab1234
then:cd /g/data/xy12/ab1234 mkdir .julia
This will also be the directory where julia will use to install all the packages. This directory can grow a bit in size so that's why it's appropriate to have it somewhere else outside your
$HOME
directory.Then we install
juliaup
. We provide--path
argument to ensure installation happens in the path we just created.curl -fsSL https://install.julialang.org | sh -s -- --path /g/data/xy12/ab1234/.julia/juliaup --default-channel 1.10
The installation should have modified your profile files. You might need to start a new session or source your shell startup scripts (e.g.,
.bashrc
,.bash_profile
) that were modified by juliaup.After doing so, Julia can be launched by typing
julia
:We then need to tell Julia that its depot path is over to the
.julia
directory we just created. (depot path is where Julia installs Julia packages, saves compiled versions of packages, etc; by default the depot would reside in$HOME/.julia
), which creates issues due to size limits of$HOME
. To do so, we add an environment variable in our.bash_profile
:export JULIA_DEPOT_PATH=/g/data/xy12/ab1234/.julia
We also add
Moving the depot into
g/data
further helps when software downloads big data sets into the depot (like ClimaOcean does).Julia is now installed! 🎉
An example script
Next let's test that things work by creating a test project:
We created an empty project.
Let's use Julia's package manager to add Oceananigans in this project and instantiate it.
We can do that within the Julia's REPL or via:
Note: installing Julia package's requires internet access and on Gadi only login nodes have internet access.
Now let's create a script that uses Oceananigans and run it. Let's call this script
hello-oceananigans.jl
and let's include:From the login node, you should be able to run this via
julia --project hello-oceananigans.jl
. This is what you should get:You just run your first Julia script on Gadi! 🎉
Submit a job via PBS
Next let's submit the same script to run via PBS.
We create a submission script, e.g. named
submit-hello-oceananigans.sh
that containsThe storage flag
gdata/xy12
is needed because Julia is installed there. Add more storage flags as required. Themodule purge
command ensures that there is no other (possibly conflicting) module loaded by the user's startup files.Then we submit the PBS job
After the job runs you should have an
output.stdout
file containingSuccess! 🎉
Run on GPU
To run the same script on a GPU you only need to modify the grid in
hello-oceananigans.jl
to be constructed withGPU()
argument, e.g.,and then modify also the
submit-hello-oceananigans.sh
script to use thegpuvolta
queue and also ask for at least one GPUThe 12 CPUs that were requested above is not a coincidence; Gadi's
gpuvolta
queue requires that you request 12 CPUs per 1 GPU; see the Gadi queue limits docs.After the above modifications, submitting the GPU job will now give
output.stdout
containingSuccess again! Woooo! 🎉
Note the difference! The grid (and by consequence also the field) you created now live on
CUDAGPU
!Run on many GPUs
We are now ready to configure Oceananigans to use multiple GPUs via CUDA-aware MPI communication. This is a bit harder to set up... But we'll do it together.
The instructions below for setting up CUDA-aware MPI on Gadi are heavily inspired from the discussion at taimoorsohail/ocean-ensembles#74 after the heroic efforts of @taimoorsohail.
We first unload all modules (just to ensure that we all start from the same page).
and load the required modules for CUDA-aware MPI configuration
We then want to ensure that the MPI versions that are called are the system defaults. To do that, we use MPIPreferences.jl package. This package identifies the MPI implementations on the machine and creates a small toml file with preferences that MPI will use.
Now we run:
$ julia -e 'using Pkg; Pkg.add("MPIPreferences"); using MPIPreferences; MPIPreferences.use_system_binary()'
The above should have generated a file called
LocalPreferences.toml
at the/g/data/xy12/ab1234/.julia/environments/v1.10/
directory that looks something like:Note 1: You don't need to run this step every time; this should only be done once and then the
LocalPreferences.toml
lives in your general Julia environment and it's available to any other project you want to run on Gadi. You might need to rerun this step if the MPI installation on Gadi changes or upgrades what not.Note 2: With the
LocalPreferences.toml
created, you might start getting warnings or errors if you don't load the corresponding MPI modules on Gadi. See the updated PBS bash script below for the required modifications. You might need to loadopenmpi
andcuda
modules as well as define theLD_LIBRARY_PATH
even if you only wanna use a single GPU/CPU.Now let's install other packages we'll need, like MPI.jl. We can install either via
or from the Julia REPL via the package manager (which we enter by pressing
]
at the REPL):Next, we ensure some more relevant environmental variables are set (consider adding them in your
.bash_profile
file).We are ready to run a script that will exercise CUDA-aware MPI communication; let's call this
hello-cuda-mpi.jl
and let's write a bash script to submit this through the queue with multiple GPUs
When this job runs the
output.stout
should contains a few hellos from the various ranks and also output like:It's essential to notice that both
c
andu
fields have different mean values on each rank.There you go! You now have a CUDA-aware MPI Oceananigans configuration!! 🎉
Beta Was this translation helpful? Give feedback.
All reactions