Skip to content

Cleanup of node, process and thread count. #625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmaassen opened this issue Sep 17, 2018 · 7 comments · Fixed by #644
Closed

Cleanup of node, process and thread count. #625

jmaassen opened this issue Sep 17, 2018 · 7 comments · Fixed by #644
Assignees
Labels
Milestone

Comments

@jmaassen
Copy link
Member

jmaassen commented Sep 17, 2018

There is some recurring confusion of the semantics of the node, process and thread count in the JobDescription. See for example xenon-middleware/xenon-cli#63 , xenon-middleware/xenon-cli#57 and #206

Currently we have:

private int nodeCount = 1;
private int processesPerNode = 1;
private int threadsPerProcess = -1;
private boolean startSingleProcess = false;

This filters thru to xenon-cli which has command options to set these values.

After some discussion we came to the following command line options for the cli:

--nodes X  (default 1)
--cores-per-node Y (default 1)

and for starting the processes -one- of the following options:

--start-per-job (default)
--start-per-node
--start-per-core

All options are optional. If no values are set, the default is used. This leads to the following behavior:

  • If you provide no options, you will get 1 node with 1 core and 1 executable being started.
  • if you provide --cores-per-node 2 you will get 1 node, 2 cores, 1 executable started
  • if you provide --cores-per-node 2 -start-per-core you will get 1 node, 2 cores, 2 executables started
  • if you provide --nodes 2 you will get 2 nodes, 1 core each, 1 executable started on first node
  • if you provide --nodes 2 --cores-per-node 2 you will get 2 nodes, 2 core each, 1 executable started on first node
  • if you provide --nodes 2 --cores-per-node 2 -start-per-node you will get 2 nodes, 2 core each, 1 executable started on each node (2 in total)
  • if you provide --nodes 2 --cores-per-node 2 -start-per-core you will get 2 nodes, 2 core each, 1 executable started on each core (4 in total)

This approach is slightly less flexible than the previous one, as it is not possible to directly express starting a job on 4 nodes with 4 processes per node and 4 threads per process (for running an mixed MPI/OpenMP job for example). However, just starting 4 nodes with 16 cores each will probably give you the same result.

For the JobDescription this would result in processesPerNode being renamed into coresPerNode,
threadsPerProcess disappearing, and startSingleProcess turning into some enum.

Any comments?

@sverhoeven
Copy link
Member

The SchedulerAdaptorDescription should include methods to determine which counts can be used

@sverhoeven
Copy link
Member

We could add SchedulerAdaptorDescription.supportsMultiNode() boolean

This will flag the local, ssh as not able to run jobs a cross multiple machines aka nodeCount > 1.

We could also flag GridEngine to not support multi node, making the whole parallel environment mapping much easier. @arnikz do you need GridEngine multi node support?

@jmaassen
Copy link
Member Author

Make sense for local, ssh and at, but the GridEngine case is a bit shakey as it does support multinode runs, it just doesn't allow you to ask for nodes ....

@jmaassen
Copy link
Member Author

It does seem that SGE supports -l excl=true to allow you to reserve an entire node for yourself.

Not sure if this completely solves the issue though. It would allow correct behavior when you specify "-nodes 10", but I'm not sure what would happen if you would say "-nodes 10 -core-per-node 16" on a 4 core/machine cluster....

@jmaassen
Copy link
Member Author

@jmaassen
Copy link
Member Author

After giving this some further thought, another option would be to go for a concept based on tasks instead, somewhat similar to what SLURM is doing.

The idea is that each task basically represents an "executable" being started somewhere (this can also be a script of course). This task may need 1 or more cores. In addition you may wish to start several of these tasks instead of just one. This straightforward to specify using:

--tasks T (default=1)
--coresPerTask C (default=1)

When you want more than one task (T > 1), these need to be distributed over one or more nodes. You could either fill up each node with as many tasks that will fit (taking the coresPerTask into account, as well as other constraints such as memory), or choose to assign less tasks per node (and thereby needing more nodes). This can simply be specified using:

--tasksPerNode N (default is unset; let scheduler decide). 

With this approach, running sequential (single task-single core) and multi threaded (single task-multiple core) jobs is still simple. In addition, it also allows for schedulers to decide what the best task-to-node assignment is (by simply not specifying tasksPerNode) which is useful for SLURM and SGE in some cases. If needed, the values of T, C and N can be used to compute the node count for SLURM and TORQUE, and the slot count for SGE.

When the job started, you can either start the executable once per job, or once for each task. The first seems to be the default on all schedulers:

--start-per-job (default)
--start-per-task

To start once per task, the adaptors can use the nodefile (TORQUE and SGE) or srun (SLURM). This approach will also make it easy to start MPI jobs, by simply using mpirun. I think this approach is easy to understand and the most flexible. I'll try to implement it to see if I run into any issues.

In the JobDescription this would translate to:

private int tasks = 1;
private int coresPerTask = 1;
private int tasksPerNode = -1;
private boolean startPerTask = false;

The rest follows from there....

@jmaassen
Copy link
Member Author

Implemented in a27b63f which passes all unit and integration tests.

Not entirely sure about the mapping in SGE yet. Need multi node multi core cluster setup to test this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants