Skip to content

make MKL thread setter/getter work for both BLAS and LAPACK #151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
VinceNeede opened this issue Apr 28, 2025 · 10 comments
Open

make MKL thread setter/getter work for both BLAS and LAPACK #151

VinceNeede opened this issue Apr 28, 2025 · 10 comments

Comments

@VinceNeede
Copy link

As of now, BLAS.set_num_threads work in 2 different ways for OpenBLAS and MKL, the former sets both BLAS and LAPACK routines, while the latter only sets BLAS ones. It seems that community expects MKL to also behave like OpenBLAS (JuliaLinearAlgebra/MKL.jl#174, https://discourse.julialang.org/t/correct-way-to-set-mkl-threads-at-runtime/128470) and this seems to be creating ambiguities.

Would it be possible to use mkl_ instead of mkl_domain as the default way, and then wrap the domain specific directly in MKL.jl? This would make things easier for end-users but still allow experts a more fine grained control

@danielwe
Copy link

Would it be possible to use mkl_ instead of mkl_domain as the default way

I think the right thing to do would be to keep using mkl_domain_, but apply it to both the BLAS and LAPACK domains, since those are the domains forwarded by libblastrampoline. The FFT, VML, and PARDISO domains should probably remain untouched.

@VinceNeede
Copy link
Author

but apply it to both the BLAS and LAPACK domains

Yes you're right, I thought it was not possible since LAPACK was not present here https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2025-1/threading-control.html, but it is either missing or I'm looking at the wrong thing since here it is present https://www.intel.com/content/www/us/en/docs/onemkl/developer-guide-linux/2025-1/mkl-domain-num-threads.html

@danielwe
Copy link

Looks like just an oversight on the part of the documentation authors in your first link. I don't have an MKL-compatible machine at hand at the moment, but the way to confirm would be to launch MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_BLAS=1, MKL_DOMAIN_LAPACK=1" julia and verify that it works as expected.

@VinceNeede
Copy link
Author

VinceNeede commented Apr 28, 2025

It didn't work, so I tried something more elaborated:

Code wrapping domain functions
using MKL, MKL.MKL_jll

const MKL_DOMAIN_BLAS = 1
const MKL_DOMAIN_LAPACK = 5

mkl_get_blas_threads() = ccall((:mkl_domain_get_max_threads, libmkl_rt), Int32, (Ptr{Int32},), Ref(Int32(MKL_DOMAIN_BLAS)));
function mkl_set_blas_threads(n) 
    e = ccall((:mkl_domain_set_num_threads, libmkl_rt), Int32, (Ptr{Int32}, Ptr{Int32}), Ref(Int32(n)), Ref(Int32(MKL_DOMAIN_BLAS)))
    e  1 && error("error in setting blas threads")
    nothing
end

mkl_get_lapack_threads() = ccall((:mkl_domain_get_max_threads, libmkl_rt), Int32, (Ptr{Int32},), Ref(Int32(MKL_DOMAIN_LAPACK)));
function mkl_set_lapack_threads(n)
    e = ccall((:mkl_domain_set_num_threads, libmkl_rt), Int32, (Ptr{Int32}, Ptr{Int32}), Ref(Int32(n)), Ref(Int32(MKL_DOMAIN_LAPACK)))
    e  1 && error("error in setting lapack threads")
    nothing
end

While it seems to work with BLAS:

julia> mkl_get_blas_threads()
2

julia> mkl_set_blas_threads(1)

julia> mkl_get_blas_threads()
1

But when attempting to set with LAPACK it gives error when setting (when getting, the value for ALL is reported as said in the docs)

julia> mkl_get_lapack_threads()
2

julia> mkl_set_lapack_threads(1)
ERROR: error in setting lapack threads
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:44
 [2] mkl_set_lapack_threads(n::Int64)
   @ Main ~/mkl parallel/test.jl:16
 [3] top-level scope
   @ REPL[6]:1

Even though the macro MKL_DOMAIN_LAPACK should be defined (they are defined in the sources used here https://github.com/JuliaPackaging/Yggdrasil/blob/master/M/MKL_Headers/build_tarballs.jl in the file mkl_types.h)

At this point I'm wondering if the error isn't in MKL itself. I don't have MKL installed for now, I'll try to install it and run programs directly in C to try to understand if it works there

@danielwe
Copy link

It didn't work

Now that I had a chance to log in to an Intel server, I can confirm that it doesn't work with MKL_jll.jl version 2025.0.1+1 (I tested with MKL.jl version 0.8.0 on both Julia 1.10 and 1.11, though there's no reason any of those version numbers should matter). I tried various permutations of the env var, including

MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_BLAS=1, MKL_DOMAIN_LAPACK=1"
MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_BLAS=1 : MKL_DOMAIN_LAPACK=1"
MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_LAPACK=1 MKL_DOMAIN_BLAS=1"

All of these work for BLAS, and none work for LAPACK.

That's concerning and does indeed smell like a bug in MKL.

Test script
using MKL
using LinearAlgebra

A = rand(5000, 5000)
A .+= A'

function blas(A, n)
    for _ in 1:n
        A * A
    end
end

function lapack(A, n)
    for _ in 1:n
        eigen(A)
    end
end

@info "Doing some BLAS"
blas(A, 5)

@info "Doing some LAPACK"
lapack(A, 2)

This runs each part long enough that it's easy to monitor CPU usage per item in htop.

@danielwe
Copy link

I suspect, though, that the bug is the mention of MKL_DOMAIN_LAPACK in the developer guide and mkl_types.h. The developer reference, the Fortran include files, and the implementation all seem consistent in not mentioning or heeding MKL_DOMAIN_LAPACK.

@VinceNeede
Copy link
Author

I think that too, but still it is not clear how to set Lapack threads. In the desperation I tried to set the others too but it didn't work, the only way to actually limit Lapack threads seems to be by setting ALL

@VinceNeede
Copy link
Author

I was about to file the issue to MKL when I found this https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-set-number-of-LAPACK-threads-during-runtime/m-p/1638274#M36554

It looks like they haven't solved this yet

@danielwe
Copy link

Good find! So at the moment, the only options for libblastrampoline are the status quo or setting #threads for all MKL domains. Even though the latter might interfere with FFTW.jl and Pardiso.jl when they are set to use MKL as backend, I still think it would be the preferred solution, for the following reasons:

  1. LAPACK threads not being affected by BLAS.set_num_threads is surprising and inconsistent with the default (OpenBLAS) behavior
  2. Not having any API to set the number of LAPACK threads is an unfortunate situation
  3. Packages that forward functions from other domains, such as FFT and Pardiso, will still be able to configure the threading within their domains, since domain-specific configurations take precedence over the global configuration

libblastrampoline could even go the extra mile and make compensatory changes to the FFT, Pardiso, and VML domains when it changes the global configuration, unless they have already been configured independently.

@VinceNeede
Copy link
Author

I agree with you, but it looks like this was changed in the past #119 due to this issue #74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants