|
| 1 | +.. meta:: |
| 2 | + :description: Omnitrace documentation and reference |
| 3 | + :keywords: Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD |
| 4 | + |
| 5 | +********************** |
| 6 | +Data collection modes |
| 7 | +********************** |
| 8 | + |
| 9 | +Omnitrace supports several modes of recording trace and profiling data for your application. |
| 10 | + |
| 11 | +.. note:: |
| 12 | + |
| 13 | + For an explanation of the terms used in this topic, see |
| 14 | + the :doc:`Omnitrace glossary <../reference/omnitrace-glossary>`. |
| 15 | + |
| 16 | ++-----------------------------+---------------------------------------------------------+ |
| 17 | +| Mode | Description | |
| 18 | ++=============================+=========================================================+ |
| 19 | +| Binary Instrumentation | Locates functions (and loops, if desired) in the binary | |
| 20 | +| | and inserts snippets at the entry and exit | |
| 21 | ++-----------------------------+---------------------------------------------------------+ |
| 22 | +| Statistical Sampling | Periodically pauses application at specified intervals | |
| 23 | +| | and records various metrics for the given call stack | |
| 24 | ++-----------------------------+---------------------------------------------------------+ |
| 25 | +| Callback APIs | Parallelism frameworks such as ROCm, OpenMP, and Kokkos | |
| 26 | +| | make callbacks into Omnitrace to provide information | |
| 27 | +| | about the work the API is performing | |
| 28 | ++-----------------------------+---------------------------------------------------------+ |
| 29 | +| Dynamic Symbol Interception | Wrap function symbols defined in a position independent | |
| 30 | +| | dynamic library/executable, like ``pthread_mutex_lock`` | |
| 31 | +| | in ``libpthread.so`` or ``MPI_Init`` in the MPI library | |
| 32 | ++-----------------------------+---------------------------------------------------------+ |
| 33 | +| User API | User-defined regions and controls for Omnitrace | |
| 34 | ++-----------------------------+---------------------------------------------------------+ |
| 35 | + |
| 36 | +The two most generic and important modes are binary instrumentation and statistical sampling. |
| 37 | +It is important to understand their advantages and disadvantages. |
| 38 | +Binary instrumentation and statistical sampling can be performed with the ``omnitrace-instrument`` |
| 39 | +executable. For statistical sampling, it's highly recommended to use the |
| 40 | +``omnitrace-sample`` executable instead if binary instrumentation isn't required or needed. |
| 41 | +Callback APIs and dynamic symbol interception can be utilized with either tool. |
| 42 | + |
| 43 | +Binary instrumentation |
| 44 | +----------------------------------- |
| 45 | + |
| 46 | +Binary instrumentation lets you record deterministic measurements for |
| 47 | +every single invocation of a given function. |
| 48 | +Binary instrumentation effectively adds instructions to the target application to |
| 49 | +collect the required information. It therefore has the potential to cause performance |
| 50 | +changes which might, in some cases, lead to inaccurate results. The effect depends on |
| 51 | +the information being collected and which features are activated in Omnitrace. |
| 52 | +For example, collecting only the wall-clock timing data |
| 53 | +has less of an effect than collecting the wall-clock timing, CPU-clock timing, |
| 54 | +memory usage, cache-misses, and number of instructions that were run. Similarly, |
| 55 | +collecting a flat profile has less overhead than a hierarchical profile |
| 56 | +and collecting a trace OR a profile has less overhead than collecting a |
| 57 | +trace AND a profile. |
| 58 | + |
| 59 | +In Omnitrace, the primary heuristic for controlling the overhead with binary |
| 60 | +instrumentation is the minimum number of instructions for selecting functions |
| 61 | +for instrumentation. |
| 62 | + |
| 63 | +Statistical sampling |
| 64 | +----------------------------------- |
| 65 | + |
| 66 | +Statistical call-stack sampling periodically interrupts the application at |
| 67 | +regular intervals using operating system interrupts. |
| 68 | +Sampling is typically less numerically accurate and specific, but the |
| 69 | +target program runs at nearly full speed. |
| 70 | +In contrast to the data derived from binary instrumentation, the resulting |
| 71 | +data is not exact but is instead a statistical approximation. |
| 72 | +However, sampling often provides a more accurate picture of the application |
| 73 | +execution because it is less intrusive to the target application and has fewer |
| 74 | +side effects on memory caches or instruction decoding pipelines. Furthermore, |
| 75 | +because sampling does not affect the execution speed as much, is it |
| 76 | +relatively immune to over-evaluating the cost of small, frequently called |
| 77 | +functions or "tight" loops. |
| 78 | + |
| 79 | +In Omnitrace, the overhead for statistical sampling depends on the |
| 80 | +sampling rate and whether the samples are taken with respect to the CPU time |
| 81 | +and/or real time. |
| 82 | + |
| 83 | +Binary instrumentation vs. statistical sampling example |
| 84 | +------------------------------------------------------- |
| 85 | + |
| 86 | +Consider the following code: |
| 87 | + |
| 88 | +.. code-block:: c++ |
| 89 | + |
| 90 | + long fib(long n) |
| 91 | + { |
| 92 | + if(n < 2) return n; |
| 93 | + return fib(n - 1) + fib(n - 2); |
| 94 | + } |
| 95 | + |
| 96 | + void run(long n) |
| 97 | + { |
| 98 | + long result = fib(n); |
| 99 | + printf("[%li] fibonacci(%li) = %li\n", i, n, result); |
| 100 | + } |
| 101 | + |
| 102 | + int main(int argc, char** argv) |
| 103 | + { |
| 104 | + long nfib = 30; |
| 105 | + long nitr = 10; |
| 106 | + if(argc > 1) nfib = atol(argv[1]); |
| 107 | + if(argc > 2) nitr = atol(argv[2]); |
| 108 | + |
| 109 | + for(long i = 0; i < nitr; ++i) |
| 110 | + run(nfib); |
| 111 | + |
| 112 | + return 0; |
| 113 | + } |
| 114 | + |
| 115 | +Binary instrumentation of the ``fib`` function will record **every single invocation** |
| 116 | +of the function. For a very small function |
| 117 | +such as ``fib``, this results in **significant** overhead since this simple function |
| 118 | +takes about 20 instructions, whereas the entry and |
| 119 | +exit snippets are ~1024 instructions. Therefore, you generally want to avoid |
| 120 | +instrumenting functions where the instrumented function has significantly fewer |
| 121 | +instructions than entry and exit instrumentation. (Note that many of the |
| 122 | +instructions in entry and exit functions are either logging functions or |
| 123 | +depend on the runtime settings and thus might never run). However, |
| 124 | +due to the number of potential instructions in the entry and exit snippets, |
| 125 | +the default behavior of ``omnitrace-instrument`` is to only instrument functions |
| 126 | +which contain fewer than 1024 instructions. |
| 127 | + |
| 128 | +However, recording every single invocation of the function can be extremely |
| 129 | +useful for detecting anomalies, such as profiles that show minimum or maximum values much smaller or larger |
| 130 | +than the average or a high standard deviation. In this case, the traces help you |
| 131 | +identify exactly when and where those instances deviated from the norm. |
| 132 | +Compare the level of detail in the following traces. In the top image, |
| 133 | +every instance of the ``fib`` function is instrumented, while in the bottom image, |
| 134 | +the ``fib`` call-stack is derived via sampling. |
| 135 | + |
| 136 | +Binary instrumentation of the Fibonacci function |
| 137 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 138 | + |
| 139 | +.. image:: ../data/fibonacci-instrumented.png |
| 140 | + :alt: Visualization of the output of a binary instrumentation of the Fibonacci function |
| 141 | + |
| 142 | +Statistical sampling of the Fibonacci function |
| 143 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 144 | + |
| 145 | +.. image:: ../data/fibonacci-sampling.png |
| 146 | + :alt: Visualization of the output of a statistical sample of the Fibonacci function |
0 commit comments