Skip to content

sdk/trace: optimize recordingSpan endTime with atomics #6730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jschaf
Copy link
Contributor

@jschaf jschaf commented May 2, 2025

Instead of using a mutex, use atomic compare-and-swap to set the endTime. Store unix nanoseconds as an int64 to enable atomics.

This has few nice properties:

  • Span.End() no longer needs the mutex. The first call to Span.End() does an atomic compare-and-swap. Later calls will return early.

  • We remove the isRecording method and use the public IsRecording method since we don't need the mutex to access endTime.

  • We reduce the recordingSpan size by 32 bytes. time.Time takes 24 bytes compared to 8 bytes for int64 nanoseconds. We save 16 bytes each for startTime and endTime.

Benchstat

benchstat private/base2.txt private/new2.txt
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Apple M2 Max
                                              │ private/base2.txt │           private/new2.txt            │
                                              │      sec/op       │    sec/op      vs base                │
Truncate/Unlimited-12                               0.2089n ±  2%   0.2090n ±  4%        ~ (p=0.698 n=10)
Truncate/Zero-12                                    0.3052n ±  1%   0.3087n ± 20%        ~ (p=0.127 n=10)
Truncate/Short-12                                   0.2081n ±  0%   0.2089n ±  1%        ~ (p=0.362 n=10)
Truncate/ASCII-12                                   0.7140n ±  2%   0.6855n ±  0%   -4.00% (p=0.000 n=10)
Truncate/ValidUTF-8-12                               1.311n ±  2%    1.303n ±  1%   -0.61% (p=0.015 n=10)
Truncate/InvalidUTF-8-12                             9.437n ±  1%    9.315n ±  3%        ~ (p=0.143 n=10)
Truncate/MixedUTF-8-12                               17.32n ±  0%    17.16n ±  2%        ~ (p=0.159 n=10)
RecordingSpanSetAttributes/WithLimit/false-12        2.080µ ±  1%    2.010µ ±  1%   -3.39% (p=0.000 n=10)
RecordingSpanSetAttributes/WithLimit/true-12         4.358µ ±  0%    4.297µ ±  0%   -1.40% (p=0.000 n=10)
SpanEnd-12                                          107.90n ± 10%    94.44n ±  1%  -12.47% (p=0.001 n=10)
TraceStart/with_a_simple_span-12                     319.5n ±  6%    293.7n ±  3%   -8.08% (p=0.000 n=10)
TraceStart/with_several_links-12                     428.3n ±  2%    411.2n ±  1%   -3.99% (p=0.000 n=10)
TraceStart/with_attributes-12                        469.6n ±  3%    430.4n ±  1%   -8.35% (p=0.000 n=10)
SpanLimits/AttributeValueLengthLimit-12              4.379µ ±  1%    4.144µ ±  1%   -5.38% (p=0.000 n=10)
SpanLimits/AttributeCountLimit-12                    4.219µ ±  2%    3.909µ ±  1%   -7.36% (p=0.000 n=10)
SpanLimits/EventCountLimit-12                        3.994µ ±  6%    3.663µ ±  1%   -8.28% (p=0.000 n=10)
SpanLimits/LinkCountLimit-12                         3.835µ ±  2%    3.619µ ±  1%   -5.63% (p=0.000 n=10)
SpanLimits/AttributePerEventCountLimit-12            4.225µ ±  3%    3.961µ ±  1%   -6.25% (p=0.000 n=10)
SpanLimits/AttributePerLinkCountLimit-12             4.218µ ±  1%    3.951µ ±  1%   -6.32% (p=0.000 n=10)
SpanSetAttributesOverCapacity-12                     1.676µ ±  2%    1.582µ ±  0%   -5.58% (p=0.000 n=10)
StartEndSpan/AlwaysSample-12                         322.3n ±  0%    279.1n ±  0%  -13.40% (p=0.000 n=10)
StartEndSpan/NeverSample-12                          153.5n ±  4%    148.1n ±  0%   -3.55% (p=0.000 n=10)
SpanWithAttributes_4/AlwaysSample-12                 527.6n ±  1%    467.7n ±  0%  -11.36% (p=0.000 n=10)
SpanWithAttributes_4/NeverSample-12                  245.1n ±  0%    227.6n ±  0%   -7.14% (p=0.000 n=10)
SpanWithAttributes_8/AlwaysSample-12                 702.7n ±  2%    638.2n ±  1%   -9.19% (p=0.000 n=10)
SpanWithAttributes_8/NeverSample-12                  336.4n ±  1%    305.0n ±  1%   -9.32% (p=0.000 n=10)
SpanWithAttributes_all/AlwaysSample-12               593.9n ±  3%    518.8n ±  1%  -12.65% (p=0.000 n=10)
SpanWithAttributes_all/NeverSample-12                267.0n ±  3%    251.6n ±  0%   -5.73% (p=0.000 n=10)
SpanWithAttributes_all_2x/AlwaysSample-12            850.2n ±  1%    749.9n ±  0%  -11.80% (p=0.000 n=10)
SpanWithAttributes_all_2x/NeverSample-12             398.4n ±  2%    350.8n ±  2%  -11.95% (p=0.000 n=10)
SpanWithEvents_4/AlwaysSample-12                     736.3n ±  9%    640.9n ±  0%  -12.96% (p=0.000 n=10)
SpanWithEvents_4/NeverSample-12                      158.1n ±  3%    149.8n ±  0%   -5.25% (p=0.000 n=10)
SpanWithEvents_8/AlwaysSample-12                    1111.5n ±  1%    995.0n ±  0%  -10.48% (p=0.000 n=10)
SpanWithEvents_8/NeverSample-12                      159.9n ±  2%    153.1n ±  0%   -4.28% (p=0.000 n=10)
SpanWithEvents_WithStackTrace/AlwaysSample-12        446.3n ±  1%    395.8n ±  0%  -11.30% (p=0.000 n=10)
SpanWithEvents_WithStackTrace/NeverSample-12         170.4n ±  1%    161.4n ±  0%   -5.25% (p=0.000 n=10)
SpanWithEvents_WithTimestamp/AlwaysSample-12         441.9n ±  1%    379.1n ±  0%  -14.19% (p=0.000 n=10)
SpanWithEvents_WithTimestamp/NeverSample-12          196.3n ±  5%    185.2n ±  1%   -5.65% (p=0.000 n=10)
TraceID_DotString-12                                 43.07n ±  1%    40.19n ±  0%   -6.69% (p=0.000 n=10)
SpanID_DotString-12                                  31.61n ±  1%    32.63n ±  0%   +3.23% (p=0.001 n=10)
SpanProcessorOnEnd/batch:_10,_spans:_10-12           163.3n ±  3%    162.3n ±  0%   -0.64% (p=0.010 n=10)
SpanProcessorOnEnd/batch:_10,_spans:_100-12          1.645µ ±  1%    1.624µ ±  0%   -1.31% (p=0.000 n=10)
SpanProcessorOnEnd/batch:_100,_spans:_10-12          162.2n ±  1%    162.5n ±  0%        ~ (p=0.587 n=10)
SpanProcessorOnEnd/batch:_100,_spans:_100-12         1.626µ ±  1%    1.623µ ±  0%        ~ (p=0.361 n=10)
SpanProcessorVerboseLogging-12                       6.795µ ±  2%    5.789µ ±  3%  -14.81% (p=0.000 n=10)
geomean                                              225.5n          211.6n         -6.19%

                                              │ private/base2.txt │           private/new2.txt            │
                                              │       B/op        │     B/op      vs base                 │
Truncate/Unlimited-12                                0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-12                                     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-12                                    0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-12                                    0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-12                               0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-12                             16.00 ± 0%       16.00 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-12                               32.00 ± 0%       32.00 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/false-12      6.891Ki ± 0%     6.859Ki ± 0%  -0.45% (p=0.000 n=10)
RecordingSpanSetAttributes/WithLimit/true-12       7.023Ki ± 0%     6.992Ki ± 0%  -0.44% (p=0.000 n=10)
SpanEnd-12                                           0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_a_simple_span-12                     528.0 ± 0%       496.0 ± 0%  -6.06% (p=0.000 n=10)
TraceStart/with_several_links-12                     704.0 ± 0%       672.0 ± 0%  -4.55% (p=0.000 n=10)
TraceStart/with_attributes-12                        784.0 ± 0%       752.0 ± 0%  -4.08% (p=0.000 n=10)
SpanLimits/AttributeValueLengthLimit-12            10.56Ki ± 0%     10.53Ki ± 0%  -0.30% (p=0.000 n=10)
SpanLimits/AttributeCountLimit-12                  9.844Ki ± 0%     9.812Ki ± 0%  -0.32% (p=0.000 n=10)
SpanLimits/EventCountLimit-12                      9.422Ki ± 0%     9.391Ki ± 0%  -0.33% (p=0.000 n=10)
SpanLimits/LinkCountLimit-12                       9.031Ki ± 0%     9.000Ki ± 0%  -0.35% (p=0.000 n=10)
SpanLimits/AttributePerEventCountLimit-12          10.47Ki ± 0%     10.44Ki ± 0%  -0.30% (p=0.000 n=10)
SpanLimits/AttributePerLinkCountLimit-12           10.47Ki ± 0%     10.44Ki ± 0%  -0.30% (p=0.000 n=10)
SpanSetAttributesOverCapacity-12                     592.0 ± 0%       560.0 ± 0%  -5.41% (p=0.000 n=10)
StartEndSpan/AlwaysSample-12                         528.0 ± 0%       496.0 ± 0%  -6.06% (p=0.000 n=10)
StartEndSpan/NeverSample-12                          144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/AlwaysSample-12                1040.0 ± 0%      1008.0 ± 0%  -3.08% (p=0.000 n=10)
SpanWithAttributes_4/NeverSample-12                  400.0 ± 0%       400.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/AlwaysSample-12               1.516Ki ± 0%     1.484Ki ± 0%  -2.06% (p=0.000 n=10)
SpanWithAttributes_8/NeverSample-12                  656.0 ± 0%       656.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/AlwaysSample-12             1.141Ki ± 0%     1.109Ki ± 0%  -2.74% (p=0.000 n=10)
SpanWithAttributes_all/NeverSample-12                464.0 ± 0%       464.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/AlwaysSample-12          1.891Ki ± 0%     1.859Ki ± 0%  -1.65% (p=0.000 n=10)
SpanWithAttributes_all_2x/NeverSample-12             848.0 ± 0%       848.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/AlwaysSample-12                    1040.0 ± 0%      1008.0 ± 0%  -3.08% (p=0.000 n=10)
SpanWithEvents_4/NeverSample-12                      144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/AlwaysSample-12                   1.641Ki ± 0%     1.609Ki ± 0%  -1.90% (p=0.000 n=10)
SpanWithEvents_8/NeverSample-12                      144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/AlwaysSample-12        624.0 ± 0%       592.0 ± 0%  -5.13% (p=0.000 n=10)
SpanWithEvents_WithStackTrace/NeverSample-12         160.0 ± 0%       160.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/AlwaysSample-12         648.0 ± 0%       616.0 ± 0%  -4.94% (p=0.000 n=10)
SpanWithEvents_WithTimestamp/NeverSample-12          184.0 ± 0%       184.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_10-12           0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_100-12          0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_10-12          0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_100-12         0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorVerboseLogging-12                     9.547Ki ± 0%     9.234Ki ± 0%  -3.27% (p=0.000 n=10)
geomean                                                         ²                 -1.34%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                              │ private/base2.txt │          private/new2.txt           │
                                              │     allocs/op     │ allocs/op   vs base                 │
Truncate/Unlimited-12                                0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-12                                     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-12                                    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-12                                    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-12                               0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-12                             1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/false-12        3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/true-12         10.00 ± 0%     10.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanEnd-12                                           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_a_simple_span-12                     2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_several_links-12                     3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_attributes-12                        4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeValueLengthLimit-12              41.00 ± 0%     41.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeCountLimit-12                    38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/EventCountLimit-12                        35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/LinkCountLimit-12                         35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerEventCountLimit-12            38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerLinkCountLimit-12             38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanSetAttributesOverCapacity-12                     3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/AlwaysSample-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/NeverSample-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/AlwaysSample-12                 4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/NeverSample-12                  3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/AlwaysSample-12                 4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/NeverSample-12                  3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/AlwaysSample-12               4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/NeverSample-12                3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/AlwaysSample-12            4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/NeverSample-12             3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/AlwaysSample-12                     5.000 ± 0%     5.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/NeverSample-12                      2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/AlwaysSample-12                     6.000 ± 0%     6.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/NeverSample-12                      2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/AlwaysSample-12        4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/NeverSample-12         3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/AlwaysSample-12         5.000 ± 0%     5.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/NeverSample-12          4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_10-12           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_100-12          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_10-12          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_100-12         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorVerboseLogging-12                       35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                         ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

See #6721.

@jschaf jschaf force-pushed the joe/opt-start-end branch 4 times, most recently from d72343c to 8570f86 Compare May 2, 2025 23:42
if s == nil {
return
}

// Store the end time as soon as possible to avoid artificially increasing
// the span's duration in case some operation below takes a while.
et := monotonicEndTime(s.startTime)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is equivalent to time.Now since time.Now returns a monotonic time.

s.mu.Lock()
if !s.isRecording() {
s.mu.Unlock()
config := trace.NewSpanEndConfig(options...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think building the EndConfig before setting the time is okay since it's only setting some fields on a struct on the stack. We could move the config below, but it would make Span.EndTime racy. A caller could observe both the time.Now value and the config value.

@@ -1045,7 +1045,7 @@ func TestEndSpanTwice(t *testing.T) {
}

ro := span.(ReadOnlySpan)
if ro.EndTime() != et1 {
if !ro.EndTime().Equal(et1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

et1 is monotonic but EndTime returns a non-monotonic timestamp since we create it via time.Unix.

@jschaf jschaf force-pushed the joe/opt-start-end branch from 8570f86 to 4801fc7 Compare May 6, 2025 00:29
Instead of using a mutex, use atomic compare-and-swap to set the endTime.
Store unix nanoseconds as an int64 to enable atomics.

This has few nice properties:

- Span.End() no longer needs the mutex. The first call to Span.End() does an
  atomic compare-and-swap. Later calls will return early.

- We remove the isRecording method and use the public IsRecording method since
  we don't need the mutex to access endTime.

- We reduce the recordingSpan size by 32 bytes. time.Time takes 24 bytes
  compared to 8 bytes for int64 nanoseconds. We save 16 bytes each for startTime
  and endTime.

Benchstat

```
benchstat private/base2.txt private/new2.txt
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/trace
cpu: Apple M2 Max
                                              │ private/base2.txt │           private/new2.txt            │
                                              │      sec/op       │    sec/op      vs base                │
Truncate/Unlimited-12                               0.2089n ±  2%   0.2090n ±  4%        ~ (p=0.698 n=10)
Truncate/Zero-12                                    0.3052n ±  1%   0.3087n ± 20%        ~ (p=0.127 n=10)
Truncate/Short-12                                   0.2081n ±  0%   0.2089n ±  1%        ~ (p=0.362 n=10)
Truncate/ASCII-12                                   0.7140n ±  2%   0.6855n ±  0%   -4.00% (p=0.000 n=10)
Truncate/ValidUTF-8-12                               1.311n ±  2%    1.303n ±  1%   -0.61% (p=0.015 n=10)
Truncate/InvalidUTF-8-12                             9.437n ±  1%    9.315n ±  3%        ~ (p=0.143 n=10)
Truncate/MixedUTF-8-12                               17.32n ±  0%    17.16n ±  2%        ~ (p=0.159 n=10)
RecordingSpanSetAttributes/WithLimit/false-12        2.080µ ±  1%    2.010µ ±  1%   -3.39% (p=0.000 n=10)
RecordingSpanSetAttributes/WithLimit/true-12         4.358µ ±  0%    4.297µ ±  0%   -1.40% (p=0.000 n=10)
SpanEnd-12                                          107.90n ± 10%    94.44n ±  1%  -12.47% (p=0.001 n=10)
TraceStart/with_a_simple_span-12                     319.5n ±  6%    293.7n ±  3%   -8.08% (p=0.000 n=10)
TraceStart/with_several_links-12                     428.3n ±  2%    411.2n ±  1%   -3.99% (p=0.000 n=10)
TraceStart/with_attributes-12                        469.6n ±  3%    430.4n ±  1%   -8.35% (p=0.000 n=10)
SpanLimits/AttributeValueLengthLimit-12              4.379µ ±  1%    4.144µ ±  1%   -5.38% (p=0.000 n=10)
SpanLimits/AttributeCountLimit-12                    4.219µ ±  2%    3.909µ ±  1%   -7.36% (p=0.000 n=10)
SpanLimits/EventCountLimit-12                        3.994µ ±  6%    3.663µ ±  1%   -8.28% (p=0.000 n=10)
SpanLimits/LinkCountLimit-12                         3.835µ ±  2%    3.619µ ±  1%   -5.63% (p=0.000 n=10)
SpanLimits/AttributePerEventCountLimit-12            4.225µ ±  3%    3.961µ ±  1%   -6.25% (p=0.000 n=10)
SpanLimits/AttributePerLinkCountLimit-12             4.218µ ±  1%    3.951µ ±  1%   -6.32% (p=0.000 n=10)
SpanSetAttributesOverCapacity-12                     1.676µ ±  2%    1.582µ ±  0%   -5.58% (p=0.000 n=10)
StartEndSpan/AlwaysSample-12                         322.3n ±  0%    279.1n ±  0%  -13.40% (p=0.000 n=10)
StartEndSpan/NeverSample-12                          153.5n ±  4%    148.1n ±  0%   -3.55% (p=0.000 n=10)
SpanWithAttributes_4/AlwaysSample-12                 527.6n ±  1%    467.7n ±  0%  -11.36% (p=0.000 n=10)
SpanWithAttributes_4/NeverSample-12                  245.1n ±  0%    227.6n ±  0%   -7.14% (p=0.000 n=10)
SpanWithAttributes_8/AlwaysSample-12                 702.7n ±  2%    638.2n ±  1%   -9.19% (p=0.000 n=10)
SpanWithAttributes_8/NeverSample-12                  336.4n ±  1%    305.0n ±  1%   -9.32% (p=0.000 n=10)
SpanWithAttributes_all/AlwaysSample-12               593.9n ±  3%    518.8n ±  1%  -12.65% (p=0.000 n=10)
SpanWithAttributes_all/NeverSample-12                267.0n ±  3%    251.6n ±  0%   -5.73% (p=0.000 n=10)
SpanWithAttributes_all_2x/AlwaysSample-12            850.2n ±  1%    749.9n ±  0%  -11.80% (p=0.000 n=10)
SpanWithAttributes_all_2x/NeverSample-12             398.4n ±  2%    350.8n ±  2%  -11.95% (p=0.000 n=10)
SpanWithEvents_4/AlwaysSample-12                     736.3n ±  9%    640.9n ±  0%  -12.96% (p=0.000 n=10)
SpanWithEvents_4/NeverSample-12                      158.1n ±  3%    149.8n ±  0%   -5.25% (p=0.000 n=10)
SpanWithEvents_8/AlwaysSample-12                    1111.5n ±  1%    995.0n ±  0%  -10.48% (p=0.000 n=10)
SpanWithEvents_8/NeverSample-12                      159.9n ±  2%    153.1n ±  0%   -4.28% (p=0.000 n=10)
SpanWithEvents_WithStackTrace/AlwaysSample-12        446.3n ±  1%    395.8n ±  0%  -11.30% (p=0.000 n=10)
SpanWithEvents_WithStackTrace/NeverSample-12         170.4n ±  1%    161.4n ±  0%   -5.25% (p=0.000 n=10)
SpanWithEvents_WithTimestamp/AlwaysSample-12         441.9n ±  1%    379.1n ±  0%  -14.19% (p=0.000 n=10)
SpanWithEvents_WithTimestamp/NeverSample-12          196.3n ±  5%    185.2n ±  1%   -5.65% (p=0.000 n=10)
TraceID_DotString-12                                 43.07n ±  1%    40.19n ±  0%   -6.69% (p=0.000 n=10)
SpanID_DotString-12                                  31.61n ±  1%    32.63n ±  0%   +3.23% (p=0.001 n=10)
SpanProcessorOnEnd/batch:_10,_spans:_10-12           163.3n ±  3%    162.3n ±  0%   -0.64% (p=0.010 n=10)
SpanProcessorOnEnd/batch:_10,_spans:_100-12          1.645µ ±  1%    1.624µ ±  0%   -1.31% (p=0.000 n=10)
SpanProcessorOnEnd/batch:_100,_spans:_10-12          162.2n ±  1%    162.5n ±  0%        ~ (p=0.587 n=10)
SpanProcessorOnEnd/batch:_100,_spans:_100-12         1.626µ ±  1%    1.623µ ±  0%        ~ (p=0.361 n=10)
SpanProcessorVerboseLogging-12                       6.795µ ±  2%    5.789µ ±  3%  -14.81% (p=0.000 n=10)
geomean                                              225.5n          211.6n         -6.19%

                                              │ private/base2.txt │           private/new2.txt            │
                                              │       B/op        │     B/op      vs base                 │
Truncate/Unlimited-12                                0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-12                                     0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-12                                    0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-12                                    0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-12                               0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-12                             16.00 ± 0%       16.00 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-12                               32.00 ± 0%       32.00 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/false-12      6.891Ki ± 0%     6.859Ki ± 0%  -0.45% (p=0.000 n=10)
RecordingSpanSetAttributes/WithLimit/true-12       7.023Ki ± 0%     6.992Ki ± 0%  -0.44% (p=0.000 n=10)
SpanEnd-12                                           0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_a_simple_span-12                     528.0 ± 0%       496.0 ± 0%  -6.06% (p=0.000 n=10)
TraceStart/with_several_links-12                     704.0 ± 0%       672.0 ± 0%  -4.55% (p=0.000 n=10)
TraceStart/with_attributes-12                        784.0 ± 0%       752.0 ± 0%  -4.08% (p=0.000 n=10)
SpanLimits/AttributeValueLengthLimit-12            10.56Ki ± 0%     10.53Ki ± 0%  -0.30% (p=0.000 n=10)
SpanLimits/AttributeCountLimit-12                  9.844Ki ± 0%     9.812Ki ± 0%  -0.32% (p=0.000 n=10)
SpanLimits/EventCountLimit-12                      9.422Ki ± 0%     9.391Ki ± 0%  -0.33% (p=0.000 n=10)
SpanLimits/LinkCountLimit-12                       9.031Ki ± 0%     9.000Ki ± 0%  -0.35% (p=0.000 n=10)
SpanLimits/AttributePerEventCountLimit-12          10.47Ki ± 0%     10.44Ki ± 0%  -0.30% (p=0.000 n=10)
SpanLimits/AttributePerLinkCountLimit-12           10.47Ki ± 0%     10.44Ki ± 0%  -0.30% (p=0.000 n=10)
SpanSetAttributesOverCapacity-12                     592.0 ± 0%       560.0 ± 0%  -5.41% (p=0.000 n=10)
StartEndSpan/AlwaysSample-12                         528.0 ± 0%       496.0 ± 0%  -6.06% (p=0.000 n=10)
StartEndSpan/NeverSample-12                          144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/AlwaysSample-12                1040.0 ± 0%      1008.0 ± 0%  -3.08% (p=0.000 n=10)
SpanWithAttributes_4/NeverSample-12                  400.0 ± 0%       400.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/AlwaysSample-12               1.516Ki ± 0%     1.484Ki ± 0%  -2.06% (p=0.000 n=10)
SpanWithAttributes_8/NeverSample-12                  656.0 ± 0%       656.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/AlwaysSample-12             1.141Ki ± 0%     1.109Ki ± 0%  -2.74% (p=0.000 n=10)
SpanWithAttributes_all/NeverSample-12                464.0 ± 0%       464.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/AlwaysSample-12          1.891Ki ± 0%     1.859Ki ± 0%  -1.65% (p=0.000 n=10)
SpanWithAttributes_all_2x/NeverSample-12             848.0 ± 0%       848.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/AlwaysSample-12                    1040.0 ± 0%      1008.0 ± 0%  -3.08% (p=0.000 n=10)
SpanWithEvents_4/NeverSample-12                      144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/AlwaysSample-12                   1.641Ki ± 0%     1.609Ki ± 0%  -1.90% (p=0.000 n=10)
SpanWithEvents_8/NeverSample-12                      144.0 ± 0%       144.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/AlwaysSample-12        624.0 ± 0%       592.0 ± 0%  -5.13% (p=0.000 n=10)
SpanWithEvents_WithStackTrace/NeverSample-12         160.0 ± 0%       160.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/AlwaysSample-12         648.0 ± 0%       616.0 ± 0%  -4.94% (p=0.000 n=10)
SpanWithEvents_WithTimestamp/NeverSample-12          184.0 ± 0%       184.0 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_10-12           0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_100-12          0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_10-12          0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_100-12         0.000 ± 0%       0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorVerboseLogging-12                     9.547Ki ± 0%     9.234Ki ± 0%  -3.27% (p=0.000 n=10)
geomean                                                         ²                 -1.34%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                              │ private/base2.txt │          private/new2.txt           │
                                              │     allocs/op     │ allocs/op   vs base                 │
Truncate/Unlimited-12                                0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Zero-12                                     0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/Short-12                                    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ASCII-12                                    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/ValidUTF-8-12                               0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/InvalidUTF-8-12                             1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Truncate/MixedUTF-8-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/false-12        3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
RecordingSpanSetAttributes/WithLimit/true-12         10.00 ± 0%     10.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanEnd-12                                           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_a_simple_span-12                     2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_several_links-12                     3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
TraceStart/with_attributes-12                        4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeValueLengthLimit-12              41.00 ± 0%     41.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributeCountLimit-12                    38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/EventCountLimit-12                        35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/LinkCountLimit-12                         35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerEventCountLimit-12            38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanLimits/AttributePerLinkCountLimit-12             38.00 ± 0%     38.00 ± 0%       ~ (p=1.000 n=10) ¹
SpanSetAttributesOverCapacity-12                     3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/AlwaysSample-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
StartEndSpan/NeverSample-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/AlwaysSample-12                 4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_4/NeverSample-12                  3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/AlwaysSample-12                 4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_8/NeverSample-12                  3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/AlwaysSample-12               4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all/NeverSample-12                3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/AlwaysSample-12            4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithAttributes_all_2x/NeverSample-12             3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/AlwaysSample-12                     5.000 ± 0%     5.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_4/NeverSample-12                      2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/AlwaysSample-12                     6.000 ± 0%     6.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_8/NeverSample-12                      2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/AlwaysSample-12        4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithStackTrace/NeverSample-12         3.000 ± 0%     3.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/AlwaysSample-12         5.000 ± 0%     5.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanWithEvents_WithTimestamp/NeverSample-12          4.000 ± 0%     4.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_10-12           0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_10,_spans:_100-12          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_10-12          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorOnEnd/batch:_100,_spans:_100-12         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
SpanProcessorVerboseLogging-12                       35.00 ± 0%     35.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                         ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```

See open-telemetry#6721.
@jschaf jschaf force-pushed the joe/opt-start-end branch from 4801fc7 to c7351b5 Compare May 6, 2025 00:31

// Nanos is the nanoseconds since the Unix epoch.
//
//nolint:recvcheck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually add a short comment around linter exceptions to provide context.


// NanosNowOrDefault returns nanoseconds from t. If t is the zero-value, returns
// the current, monotonic time in nanoseconds since the epoch.
func NanosNowOrDefault(t time.Time) Nanos {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for those methods?

return
}

s.mu.Lock()
defer s.mu.Unlock()
if !s.isRecording() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the following:

  • A call for SetAttributes
  • A call for End() in a different thread

(or the other way around, both actions very close to each other)
Then this change can lead to a race condition where either the attributes are set when they shouldn't be settable (the span should have been ended), or they aren't set when they should have been (the span has already been ended).

This seems like an edge case, but it may make us in non-conformance with the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice catch. Presumably, this part of the spec: OnEnding:

The SDK MUST guarantee that the span can no longer be modified by any other thread before invoking OnEnding of the first SpanProcessor. From that point on, modifications are only allowed synchronously from within the invoked OnEnding callbacks. All registered SpanProcessor OnEnding callbacks are executed before any SpanProcessor’s OnEnd callback is invoked.

There is an interesting callout in Span.End, but that's marked SHOULD:

Any locking used needs be minimized and SHOULD be removed entirely if possible.

As an aside, I don't think you can avoid locking in Go. In Rust, you can avoid most locking with thread-locals. Since Go doesn't expose thread locals directly, we can't use a similar trick (though there are several proposals: Shared Values, MLocal).

When do we want SetAttributes to no-op relative to End?

  • Currently, it's when we call mu.Lock in End.
  • In this PR, we have no guarantees. The closest is: snapshot acquires a mutex but passes the same backing array to the span processors, so SetAttributes can still modify the slice with unlucky timing.

There are a few ways to proceed with this PR.

  1. Accept the PR as-is. Likely not tenable since the implementation is racy, though unlikely, and not spec-compliant.

  2. Figure out a clever way to end the span with atomics—I'd like this to be possible, but I don't see how without reimplementing a mutex or modifying all attributes with atomics.

  3. Guard the span processor onEnd calls with a mutex. This helps, but since we re-use the attributes backing array in the snapshot, SetAttributes can still modify the slice.

  4. Abandon this PR. The mutex is simple and reasonably performant.

I'll think about how to be clever with atomics, but I'm leaning towards abandoning the PR.

Do you have any preference for how to proceed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this does have a performance improvement, the absolute values remain rather low. So unless you can figure out an elegant solution, I would just drop this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that dropping is probably the most pragmatic option.

@jschaf jschaf closed this May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants