Skip to content

Rewrite computeSelfTime to improve performance on Trace Statistics page #2767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 26, 2025

Conversation

DamianMaslanka5
Copy link
Contributor

Stop using drange

For 10k spans, 1s => 1ms, computeSelfTime is no longer visible in a profiler

To test:

  1. Generate data: go run cmd/tracegen/main.go -service abcd -traces 1 -spans 10000
  2. Go to Trace Statistics
  3. Change Group By to Operation Name
  4. Change Group By to Service Name

Which problem is this PR solving?

Related to #645

How was this change tested?

  • Existing tests

Before

image

After

image

@DamianMaslanka5 DamianMaslanka5 requested a review from a team as a code owner May 14, 2025 20:04
@DamianMaslanka5 DamianMaslanka5 requested review from mahadzaryab1 and removed request for a team May 14, 2025 20:04
Stop using drange

For 10k spans, 1s => 1ms, computeSelfTime is no longer visible in a profiler

To test:
1. Generate data: go run cmd/tracegen/main.go -service abcd -traces 1 -spans 10000
1. Go to Trace Statistics
1. Change Group By to Operation Name
1. Change Group By to Service Name

Signed-off-by: Damian Maslanka <[email protected]>
@DamianMaslanka5 DamianMaslanka5 force-pushed the drange-self-time-perf branch from dea82ab to 7eaec1e Compare May 14, 2025 20:06
Copy link

codecov bot commented May 14, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.32%. Comparing base (a80c328) to head (03532d7).
Report is 23 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2767      +/-   ##
==========================================
+ Coverage   96.86%   97.32%   +0.45%     
==========================================
  Files         256      256              
  Lines        7951     7961      +10     
  Branches     2076     2079       +3     
==========================================
+ Hits         7702     7748      +46     
+ Misses        248      213      -35     
+ Partials        1        0       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yurishkuro yurishkuro added the changelog:bugfix-or-minor-feature 🐞 Bug fixes, Minor Improvements label May 17, 2025
Comment on lines 58 to 60
const spanEndTime = child.startTime + child.duration;
const spanStartsAfterParentEnded = child.startTime > parentSpanEndTime;
const spanEndsBeforePreviousSpan = spanEndTime < previousSpanEndTime;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the naming is confusing. Can we stick with parentXyz and childXyz to make it easier to follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these names are accurate.

parent |..................|
child    |.......|
child     |.....|                      - spanEndsBeforePreviousSpan is true, this span is skipped
child                         |......| - spanStartsAfterParentEnded is true, this span is skipped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current naming is just spanXyz, which is confusing because there are two spans involved in every step and it's not clear which one is referenced by spanXyz name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood your initial comment, this is a general issue for the whole function, not just the lines you commented on? I see now what that I used span in the parameter of the function and span for children.

Can you confirm that these names are less confusing? Or if you have other suggestion it would be helpful if you could provide examples.

  • previousSpanEndTime => previousChildEndTime
  • spanEndTime => childEndTime
  • spanEndsBeforePreviousSpan => childEndsBeforePreviousChild
  • spanStartsAfterParentEnded => childStartsAfterParentEnded
  • spanStartsBeforePreviousSpanEnds => childStartBeforePreviousChildEnds
  • spanSelfTime => parentSelfTime?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated the names in 5935bd1

let nonOverlappingDuration = child.duration;

const spanStartsBeforePreviousSpanEnds = child.startTime < previousSpanEndTime;
if (spanStartsBeforePreviousSpanEnds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add add diagrams to illustrate the scenarios? something like

// child(n-1):  |.......|
// child(n):         |......|

Copy link
Contributor Author

@DamianMaslanka5 DamianMaslanka5 May 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about using format like this? Also see my previous comment, looks like example for L59,L60 will be also useful.

parent |.....................|
child    |.......|
child        |.....|                  - spanStartsBeforePreviousSpanEnds is true
child                |.....|          - spanStartsBeforePreviousSpanEnds is false

And example for L75

parent |......................|
child                      |.....|        - last span included in self time calculation
child                       |.........|   - skipped

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in every if() there is an implied configuration which could be represented by a diagram like that to make the logic easier to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the comments in 7b7286d

Comment on lines 63 to 64
// child |.......|
// child |.....| - childEndsBeforePreviousChild is true, skipped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// child |.......|
// child |.....| - childEndsBeforePreviousChild is true, skipped
// child |.......| - previousChild
// child |.....| - childEndsBeforePreviousChild is true, skipped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included this change in 03532d7

// parent |.....................|
// child |.......|
// child |.....| - childStartsBeforePreviousChildEnds is true
// child |.....| - childStartsBeforePreviousChildEnds is false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't this case handled (full subtraction of span's time from parent)? The else part below only covers when child extends past the parent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if is for special cases, when child doesn't have any overlap with previous span, then child.duration is subtracted from parentSpanSelfTime, see let nonOverlappingDuration = child.duration;

Code in the newest commit is a bit different.

These links are to the previous commit, without Math.min/max

  1. let nonOverlappingDuration = child.duration;
  2. parentSpanSelfTime -= nonOverlappingDuration;

let parentSpanSelfTime = parentSpan.duration;
let previousChildEndTime = parentSpan.startTime;

const children = getChildOfSpans(parentSpan.spanID, allSpans).sort((a, b) => a.startTime - b.startTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more efficient if the full allSpans array was sorted once, and then direct children arrays picked from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this actually will be faster. But I am almost sure that it is not worth spending any time on optimizing this.


const parentSpanEndTime = parentSpan.startTime + parentSpan.duration;

for (let index = 0; index < children.length; index++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alternative algorithm I had in mind:

  • lastEnd=parent.start
  • selfTime = parent.duration
  • for each sorted child
    • if child.start > parent.end break
    • start = max(lastEnd, span.start)
    • end = min(parent.end, span.end)
    • selfTime = selfTime - (end-start)
    • lastEnd = end

I asked Gemini to implement it, it got this:

function calculateSelfTime(parent, children) {
  let lastEnd = parent.startTime;
  let selfTime = parent.duration;

  // Sort children by their start time
  children.sort((a, b) => a.startTime - b.startTime);

  const parentEndTime = parent.startTime + parent.duration;

  for (const child of children) {
    // If child starts after parent ends, no more overlap is possible
    if (child.startTime >= parentEndTime) {
      break;
    }

    const childEndTime = child.startTime + child.duration;

    // Calculate the effective start of the overlap
    const start = Math.max(lastEnd, child.startTime);

    // Calculate the effective end of the overlap
    const end = Math.min(parentEndTime, childEndTime);

    // Only subtract if there's actual overlap and it's within the parent's current "uncovered" time
    if (end > start) {
      selfTime -= (end - start);
    }

    // Update lastEnd to the maximum of its current value and the child's end time
    // This correctly handles overlapping children and ensures we don't double-count
    lastEnd = Math.max(lastEnd, end);
  }

  // Ensure selfTime doesn't go below zero due to floating point inaccuracies or edge cases
  return Math.max(0, selfTime);
}

Copy link
Contributor Author

@DamianMaslanka5 DamianMaslanka5 May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is exactly the same algorithm, just implemented a bit differently.

Initially wanted to keep the code dumb, so I wanted to avoid using Math.max/min. But using Math.max/Math.min should be fine, because comments with examples are very helpful. Now I don't have any preference for which version is easier to understand.

Please let me know which version you prefer. I made the changes in 03532d7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the biggest cost to come from sorting the children repeatedly, making the whole thing an O(N^3) complexity, while I think it's possible to do it with O(N^2) if we sort first, the construct the tree from that sorted order.

As for the algorithm, min/max makes it a lot easier (for me at lest) to reason as to what's happening, since it makes the logic linear, vs. reasoning about which edge case in the diagrams represent and whether all edge cases are actually covered. In my algo I only need to think about lastEnd boundary moving up and each span measuring against it, not against any "previous" spans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the biggest cost to come from sorting the children repeatedly, making the whole thing an O(N^3) complexity, while I think it's possible to do it with O(N^2) if we sort first, the construct the tree from that sorted order.

computeSelfTime is no longer visible in a profiler, so even if it can be faster, I just don't think it is worth spending time on this, because any change to this function will not be visible. Also see #2767 (comment)

Signed-off-by: Damian Maslanka <[email protected]>
@yurishkuro yurishkuro added this pull request to the merge queue May 26, 2025
Merged via the queue into jaegertracing:main with commit 89b7e79 May 26, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:bugfix-or-minor-feature 🐞 Bug fixes, Minor Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants