-
Notifications
You must be signed in to change notification settings - Fork 4
ci: use all available CUDA devices for parallel tests #767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cscs-ci run default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the CI and pytest configurations to utilize all available CUDA devices for parallel test runs and standardizes worker configuration by replacing a custom environment variable with the standard PYTEST_XDIST_AUTO_NUM_WORKERS.
- In noxfile.py, the NUM_PROCESSES env variable is replaced with a hard-coded "auto" setting.
- In pytest_config.py, logic is added to split the CUDA_VISIBLE_DEVICES among pytest-xdist workers.
- In ci/base.yml, the custom NUM_PROCESSES variable is removed and replaced with PYTEST_XDIST_AUTO_NUM_WORKERS along with the new PYTEST_XDIST_SPLIT_CUDA_VISIBLE_DEVICES setup.
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
noxfile.py | Replaced custom NUM_PROCESSES with standard auto configuration. |
model/testing/src/icon4py/model/testing/pytest_config.py | Added logic to distribute CUDA devices across pytest workers. |
ci/base.yml | Updated environment variables to support the new configuration. |
cscs-ci run default |
1 similar comment
cscs-ci run default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances parallel testing configurations by allocating distinct CUDA devices to each pytest worker and standardizing the worker count through PYTEST_XDIST_AUTO_NUM_WORKERS.
- Replaces the custom NUM_PROCESSES environment variable with the standard PYTEST_XDIST_AUTO_NUM_WORKERS in noxfile.py and ci/base.yml.
- Adds a pytest_configure hook to assign CUDA devices based on worker IDs in model/testing/src/icon4py/model/testing/pytest_config.py.
- Updates CI configurations to echo CUDA-related environment variables for diagnostic purposes.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
noxfile.py | Removed usage of NUM_PROCESSES and hardcoded 'auto' for pytest worker count. |
model/testing/src/icon4py/model/testing/pytest_config.py | Added logic to split CUDA devices among pytest workers based on environment configuration. |
ci/default.yml | Introduced echo commands to display CUDA environment variables for debugging. |
ci/base.yml | Replaced NUM_PROCESSES with PYTEST_XDIST_AUTO_NUM_WORKERS and added split CUDA devices configuration. |
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
cscs-ci run default |
a4aded7
to
636af39
Compare
cscs-ci run default |
Enhance pytest and CSCS-CI configuration settings to use all CUDA devices during parallel tests runs.
It works by adding code in the
pytest_configure()
hook, which is executed by every pytest worker, to set the environment variableCUDA_VISIBLE_DEVICES
to a different device for each worker id. The list of available devices needs to be explicitly defined in the customPYTEST_XDIST_SPLIT_CUDA_VISIBLE_DEVICES
environment variable as a comma separated list.Additionally, replace the custom
NUM_PROCESSES
env variable by the standardPYTEST_XDIST_AUTO_NUM_WORKER
to control the number or pytest-xdist workers.