-
Notifications
You must be signed in to change notification settings - Fork 48
In some machines, tests take too much time to complete when using oversubscribe #902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have no clue what the problem can be... We use to run the same test on the CI (example) and it takes <1m. Actually, that specific test (dbcsr_unittest1) does not even use MPI... So, I would say the computation is slow... From the full log, I can see that it is actually running some operations up to:
I can see 2 actions here:
In any case, this doesn't explain why it is so slow in your run... |
Could you test with a single rank and thread? (i.e. by changing the defaults?) |
Sure, if you tell me what I should do (some change in tests/CMakeLists.txt I guess). (I forgot to say: I'm not familiar with the package, I am just routinely rebuilding all 37000 Debian source packages for QA purposes). |
I don't know what over-subscription in OpenMPI means, specifically if CPU affinity is relaxed. Generally, MPI is not meant for "tasking" aka abusing ranks like ordinary processes. My experience with over-subscription in OpenMPI is when people use it because mpirun gave a warning about an unknown number of "slots" on the target system. This always played out to be bad for performance likely perhaps CPU affinity was not fully relaxed aka no affinity. |
Using over-subscription "to move processes around as needed" without saying "forget about affinity" is likely causing problems. MPI implementations do really hard to pin processes according to "hardware geometry", which contradicts over-subscription. |
Yes, this is one way. You can change the lines after: Line 84 in 966e81e
|
Ok, I tried reducing both TEST_MPI_RANKS and TEST_OMP_THREADS from 2 to 1 and now the package takes the usual 4 minutes to build again (not several hours). On machines with 1 CPU it also takes 4 minutes. What does this little experiment tell us about a potential fix? (hopefully one that works for everybody) I don't know much about this. It is really ok to use 2 ranks and 2 threads on a machine with only 2 vCPUs? (Is the product, 2x2 = 4 what matters?) Thanks. |
For the moment, I would suggest to set during the cmake phase:
You can also try 2 threads (but definitely a single MPI). |
Ok, I'll probably try -DTEST_MPI_RANKS=1 as it seems to be the minimal change which makes it to work here. Thanks. |
Hello. I reported this to Debian here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1101363
On AWS instances of types c7a.large, m7a.large, r7a.large, which incidentally have 2 vCPUs, the Debian package for dbcsr used to take less than 4 minutes to build.
After I added
PRTE_MCA_rmaps_default_mapping_policy=:oversubscribe
, so that it also builds ok on systems with a single CPU, the build on systems with 2 CPUs now fails with timeout, like this:I tried increasing the timeout, like this:
but 3600 was not enough, and 7200 was not enouth either (still timeouts), which makes me to think that maybe the proper fix should be somewhere else.
Thanks.
The text was updated successfully, but these errors were encountered: