Open
Description
Over in bytecodealliance/wasmtime#4327 we found a ~20% performance regression on a workload when a non-power-of-2 number of threads were specified, on my 12-core system (Ryzen 3900X) and on other systems with 3, 6, ... threads manually specified. On my own system, manually specifying a power-of-two number of threads (e.g., 8) causes the regression to disappear.
@alexcrichton produced a diff that isolates the issue in that PR; it appears that it may have something to do with scopes. I am not familiar enough with the design of rayon or terminology here to describe in more detail what these changes mean.
Does this issue seem familiar at all? Could there be some dependence on a power-of-two number of threads for, e.g., even work distribution?
Metadata
Metadata
Assignees
Labels
No labels