Skip to content

[BN] Serial run large tensors test cases #3553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 28, 2025

Conversation

bghimireamd
Copy link
Contributor

@bghimireamd bghimireamd commented Feb 26, 2025

Follow up PR for #3545 Since that solution ended up taking huge CI time. In this PR we sperate the large tensor and create a separate file, effectively creating separate binary. Then to this binary that only does large tensor test we serialize the run via CMake's gtest_discover_tests

@bghimireamd bghimireamd changed the title Bg/run single bn gtest case [BN] Serial run large tensors test cases Feb 26, 2025
Comment on lines +123 to +213
INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNCKBWDSerialRun2D_FP16,
testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN2DTestCase>());

INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNOCLBWDSerialRun2D_FP16,
testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN2DTestCase>());

INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNOCLBWDSerialRun3D_FP16,
testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
testing::ValuesIn({miopenTensorNCDHW, miopenTensorNDHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN3DTestCase>());

// bfp16
INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNCKBWDSerialRun2D_BFP16,
testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN2DTestCase>());

INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNOCLBWDSerialRun2D_BFP16,
testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN2DTestCase>());

INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNOCLBWDSerialRun3D_BFP16,
testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
testing::ValuesIn({miopenTensorNCDHW, miopenTensorNDHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN3DTestCase>());

// fp32
INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNBWDSerialRun2D_FP32,
testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV1})),
TestNameGenerator<BN2DTestCase>());

INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNBWDSerialRun3D_FP32,
testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
testing::ValuesIn({miopenTensorNCDHW, miopenTensorNDHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN3DTestCase>());
// fp64
INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNBWDSerialRun2D_FP64,
testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV1})),
TestNameGenerator<BN2DTestCase>());

INSTANTIATE_TEST_SUITE_P(Smoke,
GPU_BNBWDSerialRun3D_FP64,
testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
testing::ValuesIn({miopenBNSpatial,
miopenBNPerActivation}),
testing::ValuesIn({testBNAPIV2})),
TestNameGenerator<BN3DTestCase>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these to full if they are going to run serial?

Seems like long running tests shouldn't be in smoke stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be better to move them. But if a solution is needed right here and now, in my opinion, it’s not critical.
The biggest consumers are (2, 2048, 16, 128, 128)
image

template <>
inline std::vector<BN3DTestCase> Network3DSerialCase()
{
return {{2, 2048, 16, 128, 128, miopen::batchnorm::Direction::Backward, 0, 1}};
Copy link
Contributor

@shurale-nkn shurale-nkn Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2*2048*16*128*128=1073741824=1*10^9

// edge cases
{69328, 1, 22, 22, miopen::batchnorm::Direction::ForwardTraining, 1, 1},
{69328, 1, 13, 79, miopen::batchnorm::Direction::ForwardTraining, 1, 1},
{128, 256, 14, 14, miopen::batchnorm::Direction::Backward, 0, 1},
{128, 256, 16, 16, miopen::batchnorm::Direction::Backward, 0, 1},
Copy link
Contributor

@shurale-nkn shurale-nkn Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

69328*1*22*22=33554752=3.3*10^7
128*256*14*14=6422528=6.4*10^6
they are all quite small compared to 10^9.
Recommendation to remove from serial in next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also some in the Network large above that are bigger I think.
I think we should decide on a cut-off size, and split it consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I listed all the shapes size by descending order

  1. Shape: (2, 2048, 16, 128, 128) --> Size: 1.07e+09
  2. Shape: (69328, 1, 13, 79) --> Size: 7.12e+07
  3. Shape: (64, 1024, 1024) --> Size: 6.71e+07
  4. Shape: (64, 256, 56, 56) --> Size: 5.14e+07

It seem like I can cut off 1e+9 since (2, 2048, 16, 128, 128) was the tensor size that caused the error in CI

@BrianHarrisonAMD BrianHarrisonAMD merged commit d90b07c into develop Feb 28, 2025
20 of 66 checks passed
@BrianHarrisonAMD BrianHarrisonAMD deleted the bg/run_single_bn_gtest_case branch February 28, 2025 18:38
BrianHarrisonAMD pushed a commit that referenced this pull request Mar 27, 2025
* undo code change and fix issue from cmake

* seperate large tensor test in batch norm to run serially
sohbodas pushed a commit that referenced this pull request Apr 1, 2025
* undo code change and fix issue from cmake

* seperate large tensor test in batch norm to run serially
sohbodas pushed a commit that referenced this pull request Apr 29, 2025
* undo code change and fix issue from cmake

* seperate large tensor test in batch norm to run serially
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants