Skip to content

Enchanged rocm-triton-prof.py; regex, config-yaml file #793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main_perf
Choose a base branch
from

Conversation

ravil-mobile
Copy link

No description provided.

@ravil-mobile ravil-mobile requested a review from guacamoleo May 7, 2025 15:57
@guacamoleo
Copy link

Looks like this still has the data type being printed.

Also, if I run without any args, I get an unintelligible output. It would be better than all commandline errors print the help menu.

python3 rocm-triton-prof.py
running original program...
Traceback (most recent call last):
  File "/home/dtanner/repos/rocm_triton/python/perf-kernels/fa_perf_4/rocm-triton-prof.py", line 74, in run_external_binary
    print(f"CURR.CMD: {' '.join(cmd)}")
TypeError: can only join an iterable
Timing info in `nsec`:
count       163.000000
mean     613441.061350
std       16550.151525
min      581941.000000
25%      595713.500000
50%      616461.000000
75%      625823.000000
max      650620.000000
dtype: float64


NON-FLOP related data:
   Counter Name         Max        Min          Mean     Median
0    GRBM_COUNT  10047431.0  7028708.0  7.173176e+06  7158677.0
1   TCC_HIT_sum   5252411.0  5094965.0  5.131720e+06  5129603.0
2  TCC_MISS_sum   6001565.0  5873660.0  5.941301e+06  5942057.0

FLOP related data:
                    Counter Name     Raw Data          FLOP  Relative FLOP, %
0          SQ_INSTS_VALU_ADD_F16          0.0  0.000000e+00          0.000000
1          SQ_INSTS_VALU_MUL_F16          0.0  0.000000e+00          0.000000
2          SQ_INSTS_VALU_FMA_F16     262144.0  3.355443e+07          0.023327
3        SQ_INSTS_VALU_TRANS_F16          0.0  0.000000e+00          0.000000
4          SQ_INSTS_VALU_ADD_F32    9003008.0  5.761925e+08          0.400571
5          SQ_INSTS_VALU_MUL_F32    8691712.0  5.562696e+08          0.386721
6          SQ_INSTS_VALU_FMA_F32    4536448.0  5.806653e+08          0.403681
7        SQ_INSTS_VALU_TRANS_F32    4608128.0  2.949202e+08          0.205030
8          SQ_INSTS_VALU_ADD_F64          0.0  0.000000e+00          0.000000
9          SQ_INSTS_VALU_MUL_F64          0.0  0.000000e+00          0.000000
10         SQ_INSTS_VALU_FMA_F64          0.0  0.000000e+00          0.000000
11       SQ_INSTS_VALU_TRANS_F64          0.0  0.000000e+00          0.000000
12   SQ_INSTS_VALU_MFMA_MOPS_F16  276955136.0  1.418010e+11         98.580670
13  SQ_INSTS_VALU_MFMA_MOPS_BF16          0.0  0.000000e+00          0.000000
14   SQ_INSTS_VALU_MFMA_MOPS_F32          0.0  0.000000e+00          0.000000
15   SQ_INSTS_VALU_MFMA_MOPS_F64          0.0  0.000000e+00          0.000000

Performance info in TFLOP/s:
count    163.000000
mean     234.655881
std        6.381366
min      221.085475
25%      229.845550
50%      233.336142
75%      241.462803
max      247.177346
dtype: float64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants