Skip to content

Parallelization of the PC loop #537

Open
@devinamatthews

Description

@devinamatthews

Many applications (e.g. in machine learning) end up with matrix products that have small(ish) m and n dimensions, but large K. For these cases, one would need to parallelize the pc loop to take advantage of threading in an efficient way. An example application is #536. The downside of parallelizing the pc loop is that extra workspace is required to store the partial C matrices, and then these matrices have to be reduced (in parallel probably).

To support parallelizing this loop, I propose the following plan:

  • Determine the maximum size max_size_c of the temporary C matrices that we are willing to store. This could be a constant (e.g. 8MiB) or it could depend on the cache blocking parameters in some way (e.g. not larger than the sum of the A block and B panel sizes).
  • Modify the automatic partitioning code to set the parallelism of the PC loop to the maximum value nt_pc (which is a factor of nt?) for which (nt_pc-1)*m*n <= max_size_c, and then assign the remaining ways of parallelism based on nt -> nt/nt_pc. Alternatively, the current algortihm could be extended to handle nt_ic, nt_jc, and nt_pc simultaneously and also obey the (nt_pc-1)*m*n <= max_size_c restriction. If the user sets BLIS_PC_NT then that can be used as-is (without obeying the size restriction?).
  • In bli_gemm_blk_var3.c:
    • The main thread in each thread group (after splitting for parallelism in the pc direction), except the "main" thread group, should check out a block from the existing pool for C, which is of size m*n.
    • The main thread group gets the "real" C and keeps beta as-is, all other threads get a chunk of the C buffer and set beta = 0.
    • Parallelize the pc loop as in var1 and var2.
    • After the loop, reduce (sum) the additional blocks back into the main block. This can be parallelized over m and n, and use e.g. axpym.
    • If the blocks of C should be of most size m_c along the m dimension, then the reduction step must happen in bli_gemm_blk_var1.c.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions