Skip to content

Slurm prolog #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 13, 2025
Merged

Slurm prolog #50

merged 18 commits into from
Jun 13, 2025

Conversation

heerener
Copy link
Collaborator

@heerener heerener commented Jun 4, 2025

No description provided.

@WeinaJi
Copy link
Contributor

WeinaJi commented Jun 5, 2025

I need to move the config template file from /sbo/data/scratch/weji/CWAgent_config_tpl.json to /sbo/data/scratch/CWAgent_config_tpl.json. Can I do that without starting a cluster ?

@heerener heerener temporarily deployed to aws-sandbox-hpc June 5, 2025 09:39 — with GitHub Actions Inactive
@heerener heerener temporarily deployed to aws-sandbox-hpc June 5, 2025 11:02 — with GitHub Actions Inactive
@heerener heerener temporarily deployed to aws-sandbox-hpc June 5, 2025 11:12 — with GitHub Actions Inactive
@heerener heerener temporarily deployed to aws-sandbox-hpc June 5, 2025 12:36 — with GitHub Actions Inactive
@heerener heerener temporarily deployed to aws-sandbox-hpc June 6, 2025 11:45 — with GitHub Actions Inactive
@heerener heerener force-pushed the slurm_prolog branch 2 times, most recently from 96592db to 98b5a30 Compare June 10, 2025 12:28
@heerener heerener marked this pull request as ready for review June 10, 2025 12:32
@heerener heerener requested a review from jplanasc June 10, 2025 12:32
cat << _EOF_ >> /opt/slurm/etc/scripts/prolog.d/80_cloudwatch_agent_config_prolog.sh
#!/bin/bash

CWAGENT_CONFIG=/sbo/data/scratch/CWAgent_config_\$SLURM_CLUSTER_NAME.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that we have Lustre mounted on the cluster, which I believe it will fail if we enable benchmark but disable lustre?
Can we have that file inside /scripts, for example? Or some other folder we know for sure it will be present?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had a similar thought but forgot to update this variable!

@WeinaJi does this have to live in the scratch folder, or can it live in /opt/slurm so it gets removed together with the cluster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CWAgent_config*.json is only used by the prolog script. It can live anywhere as long as it can be read by the prolog script.


if [ ! -f \$CWAGENT_CONFIG ]; then
echo "Create CWAGENT_CONFIG " \$CWAGENT_CONFIG
sed "s/\\$CLUSTER_NAME/\$SLURM_CLUSTER_NAME/g" /opt/slurm/CWAgent_config_tpl.json > \$CWAGENT_CONFIG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could probably get rid of this, as we should know the cluster name at the time this script is run?
@WeinaJi , @heerener

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @jplanasc , it would be easier to have just one json file, i.e. CWAgent_config.json in which the field "ClusterName" is a different name per cluster.
@heerener , is the env var ClusterName already available at the time we run this script?

@heerener heerener temporarily deployed to aws-sandbox-hpc June 11, 2025 07:44 — with GitHub Actions Inactive
Copy link

Code Coverage

Package Line Rate Complexity Health
hpc_provisioner 93% 0
Summary 93% (427 / 461) 0

Minimum allowed line rate is 80%

@heerener heerener temporarily deployed to aws-sandbox-hpc June 11, 2025 08:17 — with GitHub Actions Inactive
@heerener
Copy link
Collaborator Author

heerener commented Jun 11, 2025

Failing tests can be ignored on this branch; the fixes will be taken care of in #54.

@heerener heerener mentioned this pull request Jun 11, 2025
@heerener heerener temporarily deployed to aws-sandbox-hpc June 11, 2025 12:00 — with GitHub Actions Inactive
@heerener heerener temporarily deployed to aws-sandbox-hpc June 11, 2025 14:11 — with GitHub Actions Inactive
@heerener heerener merged commit ea90ec6 into main Jun 13, 2025
1 of 2 checks passed
@heerener heerener deleted the slurm_prolog branch June 13, 2025 08:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants