Skip to content

MAINT: Reuse common routines in covariance #3227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

david-cortes-intel
Copy link
Contributor

Description

This PR simplifies some parts of the covariance implementation on CPU to reuse common routines from either MKL or from the rest of daal. This should slightly simplify future work on compiler pragmas.


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.

@icfaust
Copy link
Contributor

icfaust commented May 26, 2025

@david-cortes-intel can you explain why this won't need performance benchmarks? It looks like the vectorization hints are switched to an omp pragma.

@icfaust
Copy link
Contributor

icfaust commented May 26, 2025

/intelci: run

@david-cortes-intel
Copy link
Contributor Author

@david-cortes-intel can you explain why this won't need performance benchmarks? It looks like the vectorization hints are switched to an omp pragma.

That happens only in the 'ref' headers. Otherwise, it gets replaced with calls to MKL. The builds with reference implementation use GCC which either way doesn't support those same pragmas.

@icfaust
Copy link
Contributor

icfaust commented May 26, 2025

if you want to enable this for only gcc i recommend leaving comments in the code as such. wouldn't these changes then pertain to any build not using MKL regardless of the compiler?

@david-cortes-intel
Copy link
Contributor Author

if you want to enable this for only gcc i recommend leaving comments in the code as such. wouldn't these changes then pertain to any build not using MKL regardless of the compiler?

I'm not understanding that part. The change is from using compiler hints towards using functions from the 'service' headers. When building with MKL, the functions get taken from MKL regardless of compiler, while when building without MKL, the functions are generated by the compiler, using omp simd pragmas, which is an open standard supported by all the major C++ compilers (gcc, mingw, clang, icx, IBM's, oracle's, msvc with experimental features), unlike the previous pragmas which were specific to icc and some supported by icx.

@david-cortes-intel
Copy link
Contributor Author

@icfaust I've now reverted the changes on the sections that had the pragma for aligned data, although I'm pretty sure MKL would be able to make the same optimization.

@david-cortes-intel
Copy link
Contributor Author

/intelci: run

@@ -214,6 +232,12 @@ struct RefMath<float, cpu>
for (SizeType i = 0; i < n; ++i) out[i] = sqrt(in[i]);
}

static void vInvSqrtI(SizeType n, const float * a, const SizeType inca, float * b, const SizeType incb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this is inca and incb usually 1? If so there might be a better way writing this so the compiler optimizes the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MKL has different variants of these functions. There is a variant in which they are both fixed at 1, but this PR uses it for a case where one of them is not equal to 1.

@Vika-F
Copy link
Contributor

Vika-F commented Jun 5, 2025

I agree with @icfaust and think that these changes require the benchmarks run.
Because replacing of the compiler generated loop with a call (even to some highly-optimized library) can have unpredictable performance impact.

We can have some assumptions, but they really need to be proved by actual performance data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants