You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[processor/probabilisticsampling] encoded sampling probability (support OTEP 235) (#31894)
**Description:** Creates new sampler modes named "equalizing" and
"proportional". Preserves existing functionality under the mode named
"hash_seed".
Fixes#31918
This is the final step in a sequence, the whole of this work was
factored into 3+ PRs, including the new `pkg/sampling` and the previous
step, #31946. The two new Sampler modes enable mixing OTel sampling SDKs
with Collectors in a consistent way.
The existing hash_seed mode is also a consistent sampling mode, which
makes it possible to have a 1:1 mapping between its decisions and the
OTEP 235 randomness and threshold values. Specifically, the 14-bit hash
value and sampling probability are mapped into 56-bit R-value and
T-value encodings, so that all sampling decisions in all modes include
threshold information.
This implements the semantic conventions of
open-telemetry/semantic-conventions#793, namely
the `sampling.randomness` and `sampling.threshold` attributes used for
logs where there is no tracestate.
The default sampling mode remains HashSeed. We consider a future change
of default to Proportional to be desirable, because:
1. Sampling probability is the same, only the hashing algorithm changes
2. Proportional respects and preserves information about earlier
sampling decisions, which HashSeed can't do, so it has greater
interoperability with OTel SDKs which may also adopt OTEP 235 samplers.
**Link to tracking Issue:**
Draft for
open-telemetry/opentelemetry-specification#3602.
Previously
#24811,
see also open-telemetry/oteps#235
Part of #29738
**Testing:** New testing has been added.
**Documentation:** ✅
---------
Co-authored-by: Juraci Paixão Kröhling <[email protected]>
Copy file name to clipboardExpand all lines: processor/probabilisticsamplerprocessor/README.md
+154-2
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,4 @@
1
+
1
2
# Probabilistic Sampling Processor
2
3
3
4
<!-- status autogenerated section -->
@@ -115,7 +116,9 @@ interpreted as a percentage, with values >= 100 equal to 100%
115
116
sampling. The logs sampling priority attribute is configured via
116
117
`sampling_priority`.
117
118
118
-
## Sampling algorithm
119
+
## Mode Selection
120
+
121
+
There are three sampling modes available. All modes are consistent.
119
122
120
123
### Hash seed
121
124
@@ -135,7 +138,154 @@ In order for hashing to be consistent, all collectors for a given tier
135
138
at different collector tiers to support additional sampling
136
139
requirements.
137
140
138
-
This mode uses 14 bits of sampling precision.
141
+
This mode uses 14 bits of information in its sampling decision; the
142
+
default `sampling_precision`, which is 4 hexadecimal digits, exactly
143
+
encodes this information.
144
+
145
+
This mode is selected by default.
146
+
147
+
#### Hash seed: Use-cases
148
+
149
+
The hash seed mode is most useful in logs sampling, because it can be
150
+
applied to units of telemetry other than TraceID. For example, a
151
+
deployment consisting of 100 pods can be sampled according to the
152
+
`service.instance.id` resource attribute. In this case, 10% sampling
153
+
implies collecting log records from an expected value of 10 pods.
154
+
155
+
### Proportional
156
+
157
+
OpenTelemetry specifies a consistent sampling mechanism using 56 bits
158
+
of randomness, which may be obtained from the Trace ID according to
159
+
the W3C Trace Context Level 2 specification. Randomness can also be
160
+
explicly encoding in the OpenTelemetry `tracestate` field, where it is
161
+
known as the R-value.
162
+
163
+
This mode is named because it reduces the number of items transmitted
164
+
proportionally, according to the sampling probability. In this mode,
165
+
items are selected for sampling without considering how much they were
166
+
already sampled by preceding samplers.
167
+
168
+
This mode uses 56 bits of information in its calculations. The
169
+
default `sampling_precision` (4) will cause thresholds to be rounded
170
+
in some cases when they contain more than 16 significant bits.
171
+
172
+
#### Proportional: Use-cases
173
+
174
+
The proportional mode is generally applicable in trace sampling,
175
+
because it is based on OpenTelemetry and W3C specifications. This
176
+
mode is selected by default, because it enforces a predictable
177
+
(probabilistic) ratio between incoming items and outgoing items of
178
+
telemetry. No matter how SDKs and other sources of telemetry have
179
+
been configured with respect to sampling, a collector configured with
180
+
25% proportional sampling will output (an expected value of) 1 item
181
+
for every 4 items input.
182
+
183
+
### Equalizing
184
+
185
+
This mode uses the same randomness mechanism as the propotional
186
+
sampling mode, in this case considering how much each item was already
187
+
sampled by preceding samplers. This mode can be used to lower
188
+
sampling probability to a minimum value across a whole pipeline,
189
+
making it possible to conditionally adjust sampling probabilities.
190
+
191
+
This mode compares a 56 bit threshold against the configured sampling
192
+
probability and updates when the threshold is larger. The default
193
+
`sampling_precision` (4) will cause updated thresholds to be rounded
194
+
in some cases when they contain more than 16 significant bits.
195
+
196
+
#### Equalizing: Use-cases
197
+
198
+
The equalizing mode is useful in collector deployments where client
199
+
SDKs have mixed sampling configuration and the user wants to apply a
200
+
uniform sampling probability across the system. For example, a user's
201
+
system consists of mostly components developed in-house, but also some
202
+
third-party software. Seeking to lower the overall cost of tracing,
203
+
the configures 10% sampling in the samplers for all of their in-house
204
+
components. This leaves third-party software components unsampled,
205
+
making the savings less than desired. In this case, the user could
206
+
configure a 10% equalizing probabilistic sampler. Already-sampled
207
+
items of telemetry from the in-house components will pass-through one
208
+
for one in this scenario, while items of telemetry from third-party
209
+
software will be sampled by the intended amount.
210
+
211
+
## Sampling threshold information
212
+
213
+
In all modes, information about the effective sampling probability is
214
+
added into the item of telemetry. The random variable that was used
215
+
may also be recorded, in case it was not derived from the TraceID
216
+
using a standard algorithm.
217
+
218
+
For traces, threshold and optional randomness information are encoded
219
+
in the W3C Trace Context `tracestate` fields. The tracestate is
220
+
divided into sections according to a two-character vendor code;
221
+
OpenTelemetry uses "ot" as its section designator. Within the
222
+
OpenTelemetry section, the sampling threshold is encoded using "th"
223
+
and the optional random variable is encoded using "rv".
224
+
225
+
For example, 25% sampling is encoded in a tracing Span as:
226
+
227
+
```
228
+
tracestate: ot=th:c
229
+
```
230
+
231
+
Users can randomness values in this way, independently, making it
232
+
possible to apply consistent sampling across traces for example. If
233
+
the Trace was initialized with pre-determined randomness value
234
+
`9b8233f7e3a151` and 100% sampling, it would read:
235
+
236
+
```
237
+
tracestate: ot=th:0;rv:9b8233f7e3a151
238
+
```
239
+
240
+
This component, using either proportional or equalizing modes, could
241
+
apply 50% sampling the Span. This span with randomness value
242
+
`9b8233f7e3a151` is consistently sampled at 50% because the threshold,
243
+
when zero padded (i.e., `80000000000000`), is less than the randomess
244
+
value. The resulting span will have the following tracestate:
245
+
246
+
```
247
+
tracestate: ot=th:8;rv:9b8233f7e3a151
248
+
```
249
+
250
+
For log records, threshold and randomness information are encoded in
251
+
the log record itself, using attributes. For example, 25% sampling
252
+
with an explicit randomness value is encoded as:
253
+
254
+
```
255
+
sampling.threshold: c
256
+
sampling.randomness: e05a99c8df8d32
257
+
```
258
+
259
+
### Sampling precision
260
+
261
+
When encoding sampling probability in the form of a threshold,
262
+
variable precision is permitted making it possible for the user to
263
+
restrict sampling probabilities to rounded numbers of fixed width.
264
+
265
+
Because the threshold is encoded using hexadecimal digits, each digit
266
+
contributes 4 bits of information. One digit of sampling precision
267
+
can express exact sampling probabilities 1/16, 2/16, ... through
268
+
16/16. Two digits of sampling precision can express exact sampling
269
+
probabilities 1/256, 2/256, ... through 256/256. With N digits of
270
+
sampling precision, there are exactly `(2^N)-1` exactly representable
271
+
probabilities.
272
+
273
+
Depending on the mode, there are different maximum reasonable settings
274
+
for this parameter.
275
+
276
+
- The `hash_seed` mode uses a 14-bit hash function, therefore
277
+
precision 4 completely captures the available information.
278
+
- The `equalizing` mode configures a sampling probability after
279
+
parsing a `float32` value, which contains 20 bits of precision,
280
+
therefore precision 5 completely captures the available information.
281
+
- The `proportional` mode configures its ratio using a `float32`
282
+
value, however it carries out the arithmetic using 56-bits of
283
+
precision. In this mode, increasing precision has the effect
284
+
of preserving precision applied by preceding samplers.
285
+
286
+
In cases where larger precision is configured than is actually
287
+
available, the added precision has no effect because trailing zeros
288
+
are eliminated by the encoding.
139
289
140
290
### Error handling
141
291
@@ -153,9 +303,11 @@ false, in which case erroneous data will pass through the processor.
153
303
154
304
The following configuration options can be modified:
155
305
306
+
-`mode` (string, optional): One of "proportional", "equalizing", or "hash_seed"; the default is "proportional" unless either `hash_seed` is configured or `attribute_source` is set to `record`.
156
307
-`sampling_percentage` (32-bit floating point, required): Percentage at which items are sampled; >= 100 samples all items, 0 rejects all items.
157
308
-`hash_seed` (32-bit unsigned integer, optional, default = 0): An integer used to compute the hash algorithm. Note that all collectors for a given tier (e.g. behind the same load balancer) should have the same hash_seed.
158
309
-`fail_closed` (boolean, optional, default = true): Whether to reject items with sampling-related errors.
310
+
-`sampling_precision` (integer, optional, default = 4): Determines the number of hexadecimal digits used to encode the sampling threshold. Permitted values are 1..14.
0 commit comments