[BN] Serial run large tensors test cases #3553

bghimireamd · 2025-02-26T18:26:06Z

Follow up PR for #3545 Since that solution ended up taking huge CI time. In this PR we sperate the large tensor and create a separate file, effectively creating separate binary. Then to this binary that only does large tensor test we serialize the run via CMake's gtest_discover_tests

…n_gtest_case

BrianHarrisonAMD · 2025-02-28T18:30:21Z

test/gtest/bn_bwd_serial_run.cpp

+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNCKBWDSerialRun2D_FP16,
+                         testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN2DTestCase>());
+
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNOCLBWDSerialRun2D_FP16,
+                         testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN2DTestCase>());
+
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNOCLBWDSerialRun3D_FP16,
+                         testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCDHW, miopenTensorNDHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN3DTestCase>());
+
+// bfp16
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNCKBWDSerialRun2D_BFP16,
+                         testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN2DTestCase>());
+
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNOCLBWDSerialRun2D_BFP16,
+                         testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN2DTestCase>());
+
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNOCLBWDSerialRun3D_BFP16,
+                         testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCDHW, miopenTensorNDHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN3DTestCase>());
+
+// fp32
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNBWDSerialRun2D_FP32,
+                         testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV1})),
+                         TestNameGenerator<BN2DTestCase>());
+
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNBWDSerialRun3D_FP32,
+                         testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCDHW, miopenTensorNDHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN3DTestCase>());
+// fp64
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNBWDSerialRun2D_FP64,
+                         testing::Combine(testing::ValuesIn(Network2DSerialCase<BN2DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV1})),
+                         TestNameGenerator<BN2DTestCase>());
+
+INSTANTIATE_TEST_SUITE_P(Smoke,
+                         GPU_BNBWDSerialRun3D_FP64,
+                         testing::Combine(testing::ValuesIn(Network3DSerialCase<BN3DTestCase>()),
+                                          testing::ValuesIn({miopenTensorNCHW, miopenTensorNHWC}),
+                                          testing::ValuesIn({miopenBNSpatial,
+                                                             miopenBNPerActivation}),
+                                          testing::ValuesIn({testBNAPIV2})),
+                         TestNameGenerator<BN3DTestCase>());


Can we move these to full if they are going to run serial?

Seems like long running tests shouldn't be in smoke stage.

I agree that it would be better to move them. But if a solution is needed right here and now, in my opinion, it’s not critical.
The biggest consumers are (2, 2048, 16, 128, 128)

shurale-nkn · 2025-02-28T18:29:42Z

test/gtest/bn_test_data.hpp

+template <>
+inline std::vector<BN3DTestCase> Network3DSerialCase()
+{
+    return {{2, 2048, 16, 128, 128, miopen::batchnorm::Direction::Backward, 0, 1}};


2*2048*16*128*128=1073741824=1*10^9

shurale-nkn · 2025-02-28T18:32:21Z

test/gtest/bn_test_data.hpp

        // edge cases
        {69328, 1, 22, 22, miopen::batchnorm::Direction::ForwardTraining, 1, 1},
        {69328, 1, 13, 79, miopen::batchnorm::Direction::ForwardTraining, 1, 1},
+        {128, 256, 14, 14, miopen::batchnorm::Direction::Backward, 0, 1},
+        {128, 256, 16, 16, miopen::batchnorm::Direction::Backward, 0, 1},


69328*1*22*22=33554752=3.3*10^7
128*256*14*14=6422528=6.4*10^6
they are all quite small compared to 10^9.
Recommendation to remove from serial in next PR.

There is also some in the Network large above that are bigger I think.
I think we should decide on a cut-off size, and split it consistently.

When I listed all the shapes size by descending order

Shape: (2, 2048, 16, 128, 128) --> Size: 1.07e+09

Shape: (69328, 1, 13, 79) --> Size: 7.12e+07

Shape: (64, 1024, 1024) --> Size: 6.71e+07

Shape: (64, 256, 56, 56) --> Size: 5.14e+07

It seem like I can cut off 1e+9 since (2, 2048, 16, 128, 128) was the tensor size that caused the error in CI

* undo code change and fix issue from cmake * seperate large tensor test in batch norm to run serially

bghimireamd added 2 commits February 26, 2025 09:33

undo code change and fix issue from cmake

242a89d

seperate large tensor test in batch norm to run serially

0425072

bghimireamd requested review from BrianHarrisonAMD and BradPepersAMD as code owners February 26, 2025 18:26

bghimireamd changed the title ~~Bg/run single bn gtest case~~ [BN] Serial run large tensors test cases Feb 26, 2025

bghimireamd added 3 commits February 26, 2025 18:33

fix merge conflict

c410f58

Merge branch 'develop' of github.com:ROCm/MIOpen into bg/run_single_b…

8900b1c

…n_gtest_case

Merge branch 'develop' of github.com:ROCm/MIOpen into bg/run_single_b…

07163de

…n_gtest_case

bghimireamd requested review from adickin-amd and JonathanLichtnerAMD as code owners February 28, 2025 14:30

BrianHarrisonAMD reviewed Feb 28, 2025

View reviewed changes

shurale-nkn approved these changes Feb 28, 2025

View reviewed changes

BrianHarrisonAMD approved these changes Feb 28, 2025

View reviewed changes

BrianHarrisonAMD merged commit d90b07c into develop Feb 28, 2025
20 of 66 checks passed

BrianHarrisonAMD deleted the bg/run_single_bn_gtest_case branch February 28, 2025 18:38

bghimireamd mentioned this pull request Feb 28, 2025

[BN] run large size tensor test cases serially #3566

Merged

BrianHarrisonAMD pushed a commit that referenced this pull request Mar 27, 2025

[BN] Serial run large tensors test cases (#3553)

2414a92

* undo code change and fix issue from cmake * seperate large tensor test in batch norm to run serially

sohbodas pushed a commit that referenced this pull request Apr 1, 2025

[BN] Serial run large tensors test cases (#3553)

dfbb179

* undo code change and fix issue from cmake * seperate large tensor test in batch norm to run serially

sohbodas pushed a commit that referenced this pull request Apr 29, 2025

[BN] Serial run large tensors test cases (#3553)

a9fbc3f

* undo code change and fix issue from cmake * seperate large tensor test in batch norm to run serially

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BN] Serial run large tensors test cases #3553

[BN] Serial run large tensors test cases #3553

Uh oh!

bghimireamd commented Feb 26, 2025 •

edited

Loading

Uh oh!

BrianHarrisonAMD Feb 28, 2025

Uh oh!

shurale-nkn Feb 28, 2025

Uh oh!

shurale-nkn Feb 28, 2025 •

edited

Loading

Uh oh!

shurale-nkn Feb 28, 2025 •

edited

Loading

Uh oh!

BrianHarrisonAMD Feb 28, 2025

Uh oh!

bghimireamd Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

[BN] Serial run large tensors test cases #3553

[BN] Serial run large tensors test cases #3553

Uh oh!

Conversation

bghimireamd commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BrianHarrisonAMD Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

shurale-nkn Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

shurale-nkn Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shurale-nkn Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BrianHarrisonAMD Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

bghimireamd Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bghimireamd commented Feb 26, 2025 •

edited

Loading

shurale-nkn Feb 28, 2025 •

edited

Loading

shurale-nkn Feb 28, 2025 •

edited

Loading