You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/conceptual/how-omnitrace-works.rst
+66-54Lines changed: 66 additions & 54 deletions
Original file line number
Diff line number
Diff line change
@@ -11,89 +11,101 @@ some basic tips to help you get started. It also explains the main data
11
11
collection modes, including a comparison between binary instrumentation
12
12
and statistical sampling.
13
13
14
-
Omnitrace Nomenclature
14
+
Omnitrace nomenclature
15
15
========================================
16
16
17
17
The list provided below is intended to provide a basic glossary for those who
18
18
are not familiar with binary instrumentation. It also clarifies ambiguities
19
19
when certain terms have different
20
-
contextual meanings, for example, Omnitrace's definition of the term "module"
20
+
contextual meanings, for example, the Omnitrace meaning of the term "module"
21
21
when instrumenting Python.
22
22
23
23
**Binary**
24
-
A file written in the Executable and Linkable Format (ELF). This is the standard file format for executable files, shared libraries, etc.
24
+
A file written in the Executable and Linkable Format (ELF). This is the standard file
25
+
format for executable files, shared libraries, etc.
25
26
26
-
**Binary Instrumentation**
27
-
Inserting callbacks to instrumentation into an existing binary. This can be performed statically or dynamically.
27
+
**Binary instrumentation**
28
+
Inserting callbacks to instrumentation into an existing binary. This can be performed
29
+
statically or dynamically.
28
30
29
-
**Static Binary Instrumentation**
30
-
Loads an existing binary, determines instrumentation points, and generates a new binary with instrumentation directly embedded. It is applicable to executables and libraries but limited to only the functions defined in the binary. This is also known as **Binary Rewrite**.
31
+
**Static binary instrumentation**
32
+
Loads an existing binary, determines instrumentation points, and generates a new binary
33
+
with instrumentation directly embedded. It is applicable to executables and libraries but
34
+
limited to only the functions defined in the binary. This is also known as **Binary rewrite**.
31
35
32
-
**Dynamic Binary Instrumentation**
33
-
Loads an existing binary into memory, inserts instrumentation, and executes the binary. It is limited to executables but capable of instrumenting linked libraries. This is also known as: **Runtime Instrumentation**
36
+
**Dynamic binary instrumentation**
37
+
Loads an existing binary into memory, inserts instrumentation, and executes the binary.
38
+
It is limited to executables but capable of instrumenting linked libraries.
39
+
This is also known as: **Runtime instrumentation**.
34
40
35
-
**Statistical Sampling**
36
-
At periodic intervals, the application is paused and the current call-stack of the CPU is recorded alongside with various other metrics. It uses timers that measure either (A) real clock time or (B) the CPU time used by the current thread and the CPU time expended on behalf of the thread by the system. This is also known as just **sampling**.
41
+
**Statistical sampling**
42
+
At periodic intervals, the application is paused and the current call-stack of the CPU
43
+
is recorded alongside with various other metrics. It uses timers that measure either (A) real clock time or (B) the CPU time used by the current thread and the CPU time expended on behalf of the thread by the system. This is also known as just **sampling**.
37
44
38
-
**Sampling Rate**
45
+
**Sampling rate**
39
46
* The period at which (A) or (B) are triggered (in units of ``# interrupts / second``)
40
47
* Higher values increase the number of samples
41
48
42
-
**Sampling Delay**
49
+
**Sampling delay**
43
50
* How long to wait before (A) and (B) begin triggering at their designated rate
44
51
45
-
**Sampling Duration**
46
-
* The time (in realtime) after the start of the application to record samples. Once this time limit has been reached, no more samples will be recorded.
52
+
**Sampling duration**
53
+
* The time (in real-time) after the start of the application to record samples.
54
+
* Once this time limit has been reached, no more samples will be recorded.
47
55
48
-
**Process Sampling**
49
-
At periodic (realtime) intervals, a background thread records global metrics without
56
+
**Process sampling**
57
+
At periodic (real-time) intervals, a background thread records global metrics without
50
58
interrupting the current process. These metrics include, but are not limited to:
51
59
CPU frequency, CPU memory high-water mark (i.e. peak memory usage), GPU Temperature,
52
60
and GPU Power usage.
53
61
54
-
**Sampling Rate**
55
-
* The realtime period for recording metrics (in units of ``# measurements / second``)
62
+
**Sampling rate**
63
+
* The real-time period for recording metrics (in units of ``# measurements / second``)
56
64
* Higher values increase the number of samples
57
65
58
-
**Sampling Delay**
59
-
* How long to wait (in realtime) before recording samples
66
+
**Sampling delay**
67
+
* How long to wait (in real-time) before recording samples
60
68
61
-
**Sampling Duration**
62
-
* The time (in realtime) after the start of the application to record samples. Once this time limit has been reached, no more samples will be recorded.
69
+
**Sampling duration**
70
+
* The time (in real-time) after the start of the application to record samples.
71
+
* Once this time limit has been reached, no more samples will be recorded.
63
72
64
73
**Module**
65
74
With respect to binary instrumentation, a module is defined as either the filename
66
75
(such as ``foo.c``) or library name (``libfoo.so``) which contains the definition
67
76
of one or more functions.
68
77
69
-
With respect to Python instrumentation, a module is defined as the **file** which contains the definition of one or more functions. The full path to this file typically contains the name of the "Python module".
78
+
With respect to Python instrumentation, a module is defined as the **file** which contains
79
+
the definition of one or more functions. The full path to this file typically contains the
80
+
name of the "Python module".
70
81
71
-
**Basic Block**
82
+
**Basic block**
72
83
Straight-line code sequence with no branches in (except for the entry) and
73
84
no branches out (except for the exit).
74
85
75
-
**Address Range**
86
+
**Address range**
76
87
The instructions for a function in a binary start at certain address with the ELF file and end at a certain address. The range is ``end - start``.
77
88
78
89
The address range is a decent approximation for the "cost" of a function.
79
90
For example, a larger address range approximately equates to more instructions.
80
91
81
-
**Instrumentation Traps**
92
+
**Instrumentation traps**
82
93
On the x86 architecture, because instructions are of variable size, the instruction
83
94
at a point may be too small for Dyninst to replace it with the normal code sequence
84
95
used to call instrumentation. When instrumentation is placed at points other
85
96
than subroutine entry, exit, or call points, traps may be used to ensure
86
-
the instrumentation fits. (By default, omnitrace-instrument avoids instrumentation
97
+
the instrumentation fits. (By default, ``omnitrace-instrument`` avoids instrumentation
87
98
which requires using a trap.)
88
99
89
100
**Overlapping functions**
90
101
Due to language constructs or compiler optimizations, it may be possible for
91
102
multiple functions to overlap (that is, share part of the same function body)
92
103
or for a single function to have multiple entry points. In practice, it is
93
104
impossible to determine the difference between multiple overlapping functions
94
-
and a single function with multiple entry points. (By default, omnitrace-instrument avoids instrumenting overlapping functions.)
105
+
and a single function with multiple entry points. (By default, ``omnitrace-instrument``
106
+
avoids instrumenting overlapping functions.)
95
107
96
-
General Tips for Using Omnitrace
108
+
General tips for using Omnitrace
97
109
========================================
98
110
99
111
* Use ``omnitrace-avail`` to lookup configuration settings, hardware counters, and data collection components
@@ -110,7 +122,7 @@ General Tips for Using Omnitrace
110
122
* Use binary instrumentation for characterizing the performance of every invocation of specific functions
111
123
* Use statistical sampling to characterize the performance of the entire application while minimizing overhead
112
124
* Enable statistical sampling after binary instrumentation to help "fill in the gaps" between instrumented regions
113
-
* Use the user API to create custom regions, enable/disable omnitrace to specific processes, threads, and/or regions
125
+
* Use the user API to create custom regions, enable/disable Omnitrace to specific processes, threads, and/or regions
114
126
* Dynamic symbol interception, callback APIs, and the user API are always available with binary instrumentation and sampling
115
127
116
128
* Dynamic symbol interception and callback APIs are (generally) controlled through ``OMNITRACE_USE_<API>`` options, e.g. ``OMNITRACE_USE_KOKKOSP``, ``OMNITRACE_USE_OMPT`` enable Kokkos-Tools and OpenMP-Tools callbacks, respectively
@@ -122,7 +134,7 @@ General Tips for Using Omnitrace
122
134
* When call-counts are high, improving the performance of this function or "inlining" the function can be quick and easy performance improvements
123
135
* When the standard-deviation is high, collect a hierarchical profile and see if the high variation can be attributable to the calling context. In this scenario, consider creating a specialized version for the function for the longer running contexts
124
136
* Collect a hierarchical profile and, keeping the flat-profiling data in mind, verify the functions noted in the flat profile are part of the "critical path" of your application
125
-
* E.g. function(s) with high call counts, etc. which are part of a "setup" or "post-processing" phase which does not consume much time relative to the overall time is, generally, a lower priority for optimization
137
+
* E.g. functions with high call counts, etc. which are part of a "setup" or "post-processing" phase which does not consume much time relative to the overall time is, generally, a lower priority for optimization
126
138
127
139
* Use the information from the profiles when analyzing detailed traces
128
140
* When using binary instrumentation in the "trace" mode, the binary rewrites are preferable to runtime instrumentation.
@@ -134,10 +146,10 @@ General Tips for Using Omnitrace
134
146
* Runtime instrumentation requires a fork + ptrace: which is generally incompatible with how MPI applications spawn their processes
135
147
* Binary rewrite the executable using MPI (and, optionally, libraries used by the executable) and execute the generated instrumented executable via ``omnitrace-run`` instead of the original, e.g. ``mpirun -n 2 ./myexe`` should be ``mpirun -n 2 omnitrace-run -- ./myexe.inst`` where ``myexe.inst`` is the generated instrumented ``myexe`` executable.
136
148
137
-
Data Collection Modes
149
+
Data collection modes
138
150
========================================
139
151
140
-
OmniTrace supports several modes of recording trace and profiling data for your application:
152
+
Omnitrace supports several modes of recording trace and profiling data for your application:
0 commit comments