Skip to content

Defer resolution of StackTrace debug info until use #115019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

nike4613
Copy link

@nike4613 nike4613 commented Apr 24, 2025

In profiling an application I work on, I've found that a significant amount of time is spent capturing stack traces. The usage of these stack traces is somewhat unusual, however, in that I do not know whether any given trace is needed until much later, at which point, if I do need it, I want debug information.

Profiling this, I've found that with portable debug info, nearly half of the time spent capturing a stack trace is in fact spend resolving debug info (and more with Windows full PDBs). For my usecase, deferring that resolution until use-site can significantly improve capture performance, without loosing access to debug info.

Comparison benchmarks
BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5854/22H2/2022Update)
AMD Ryzen 9 9950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK 9.0.300
  [Host]     : .NET 8.0.16 (8.0.1625.21506), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  PR         : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
  Main       : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Method Job Depth Mean Error StdDev Ratio RatioSD
NoCapture PR 5 58.28 ns 1.208 ns 1.071 ns 1.02 0.02
CaptureNoFileInfo PR 5 5,858.99 ns 116.580 ns 167.196 ns 102.65 3.00
CaptureFileInfo PR 5 7,568.10 ns 41.593 ns 38.906 ns 132.59 1.27
CaptureNoFileInfoAndEnumerate PR 5 6,289.64 ns 82.059 ns 76.758 ns 110.19 1.58
CaptureFileInfoAndEnumerate PR 5 10,783.85 ns 94.295 ns 88.203 ns 188.93 2.15
NoCapture Main 5 57.08 ns 0.513 ns 0.480 ns 1.00 0.01
CaptureNoFileInfo Main 5 6,144.38 ns 78.935 ns 73.836 ns 107.65 1.53
CaptureFileInfo Main 5 10,279.17 ns 97.044 ns 90.775 ns 180.09 2.13
CaptureNoFileInfoAndEnumerate Main 5 6,426.74 ns 77.952 ns 69.102 ns 112.60 1.49
CaptureFileInfoAndEnumerate Main 5 10,797.31 ns 122.231 ns 114.335 ns 189.17 2.48
NoCapture PR 20 215.21 ns 1.034 ns 0.967 ns 1.01 0.01
CaptureNoFileInfo PR 20 8,736.13 ns 113.751 ns 106.403 ns 40.93 0.51
CaptureFileInfo PR 20 11,062.19 ns 110.596 ns 103.452 ns 51.83 0.51
CaptureNoFileInfoAndEnumerate PR 20 9,255.44 ns 109.534 ns 102.458 ns 43.36 0.49
CaptureFileInfoAndEnumerate PR 20 17,383.61 ns 189.096 ns 176.881 ns 81.45 0.86
NoCapture Main 20 213.44 ns 0.863 ns 0.807 ns 1.00 0.01
CaptureNoFileInfo Main 20 9,015.06 ns 123.752 ns 115.757 ns 42.24 0.55
CaptureFileInfo Main 20 16,823.56 ns 227.952 ns 213.226 ns 78.82 1.01
CaptureNoFileInfoAndEnumerate Main 20 9,240.67 ns 129.410 ns 121.050 ns 43.30 0.57
CaptureFileInfoAndEnumerate Main 20 17,066.16 ns 209.234 ns 195.718 ns 79.96 0.93
NoCapture PR 80 821.63 ns 3.361 ns 2.979 ns 1.00 0.01
CaptureNoFileInfo PR 80 19,670.00 ns 211.622 ns 197.952 ns 23.93 0.25
CaptureFileInfo PR 80 24,942.15 ns 314.061 ns 293.773 ns 30.34 0.37
CaptureNoFileInfoAndEnumerate PR 80 20,577.22 ns 243.861 ns 228.108 ns 25.03 0.29
CaptureFileInfoAndEnumerate PR 80 43,064.16 ns 608.658 ns 569.339 ns 52.38 0.71
NoCapture Main 80 822.13 ns 3.890 ns 3.638 ns 1.00 0.01
CaptureNoFileInfo Main 80 20,310.32 ns 172.358 ns 152.791 ns 24.70 0.21
CaptureFileInfo Main 80 42,229.00 ns 643.617 ns 602.039 ns 51.37 0.74
CaptureNoFileInfoAndEnumerate Main 80 20,525.69 ns 259.266 ns 242.518 ns 24.97 0.30
CaptureFileInfoAndEnumerate Main 80 42,955.51 ns 349.237 ns 326.676 ns 52.25 0.45
NoCapture PR 200 2,033.92 ns 17.419 ns 16.294 ns 1.00 0.01
CaptureNoFileInfo PR 200 41,819.99 ns 199.173 ns 176.561 ns 20.55 0.14
CaptureFileInfo PR 200 52,198.39 ns 680.470 ns 636.512 ns 25.65 0.33
CaptureNoFileInfoAndEnumerate PR 200 43,160.21 ns 596.295 ns 557.775 ns 21.21 0.29
CaptureFileInfoAndEnumerate PR 200 96,176.61 ns 395.542 ns 369.990 ns 47.27 0.30
NoCapture Main 200 2,034.87 ns 11.625 ns 10.874 ns 1.00 0.01
CaptureNoFileInfo Main 200 42,506.34 ns 385.829 ns 360.905 ns 20.89 0.20
CaptureFileInfo Main 200 94,704.01 ns 316.419 ns 280.497 ns 46.54 0.28
CaptureNoFileInfoAndEnumerate Main 200 41,783.50 ns 489.032 ns 457.440 ns 20.53 0.24
CaptureFileInfoAndEnumerate Main 200 92,045.32 ns 920.375 ns 815.888 ns 45.24 0.45

With this PR, capturing a trace with fNeedFileInfo: true is only slightly slower than capturing without, and the cost involved in resolving debug info is deferred until enumeration.

This PR in its current state is minimally invasive; there are several places I have not yet touched so that I could demonstrate the above difference.

Benchmark files

stacktrace-perf.csproj

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <OutputType>Exe</OutputType>
    <Nullable>enable</Nullable>
    <WarningsAsErrors>$(WarningsAsErrors);nullable</WarningsAsErrors>
  </PropertyGroup>
  <PropertyGroup>
    <PlatformTarget>AnyCPU</PlatformTarget>
    <DebugType>portable</DebugType>
    <DebugSymbols>true</DebugSymbols>
    <AllowUnsafeBlocks>true</AllowUnsafeBlocks>
    <Optimize>true</Optimize>
    <Configuration>Release</Configuration>
    <IsPackable>false</IsPackable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="BenchmarkDotNet" Version="0.14.0" />
    <PackageReference Include="BenchmarkDotNet.Diagnostics.Windows" Version="0.14.0"/>
  </ItemGroup>
</Project>

Program.cs

using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;

namespace stacktrace_perf;

public class Program
{
    public static void Main(string[] args)
    {
        var config = DefaultConfig.Instance;
        var summary = BenchmarkRunner.Run<Benchmarks>(config, args);
    }
}

Benchmarks.cs

using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;

namespace stacktrace_perf;

public class Benchmarks
{
    [Params(5, 20, 80, 200)]
    public int Depth { get; set; }

    [Benchmark(Baseline = true)]
    public StackTrace? NoCapture()
    {
        return StacktraceCapture.CaptureWithDepth(Depth, false, false);
    }

    [Benchmark]
    public StackTrace? CaptureNoFileInfo()
    {
        return StacktraceCapture.CaptureWithDepth(Depth, true, false);
    }

    [Benchmark]
    public StackTrace? CaptureFileInfo()
    {
        return StacktraceCapture.CaptureWithDepth(Depth, true, true);
    }

    [Benchmark]
    public object? CaptureNoFileInfoAndEnumerate()
    {
        return StacktraceCapture.CaptureWithDepth(Depth, true, false)?.GetFrames();
    }

    [Benchmark]
    public object? CaptureFileInfoAndEnumerate()
    {
        return StacktraceCapture.CaptureWithDepth(Depth, true, true)?.GetFrames();
    }
}


public static class StacktraceCapture
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static StackTrace? CaptureWithDepth(int depth, bool capture, bool fileInfo)
    {
        // allocate some stack slots
        object? a = null, b = null, c = null, d = null, e = null;

        KeepStackSlot(ref a);
        KeepStackSlot(ref b);
        KeepStackSlot(ref c);
        KeepStackSlot(ref d);
        KeepStackSlot(ref e);

        try
        {
            if (depth > 0) return CaptureWithDepth(depth - 1, capture, fileInfo);

        }
        catch (Exception ex)
        {
            // try/catch to keep the method on stack and prevent tail call optimization
            _ = ex;
        }

        if (capture)
        {
            return new StackTrace(fNeedFileInfo: fileInfo);
        }
        else
        {
            return null;
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void KeepStackSlot(ref object? slot)
    {
        GC.KeepAlive(slot);
    }
}

@ghost ghost added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Apr 24, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 24, 2025
@@ -86,8 +86,40 @@ public StackFrameHelper()
internal void InitializeSourceInfo(bool fNeedFileInfo, Exception? exception)
{
StackTrace.GetStackFramesInternal(this, fNeedFileInfo, exception);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to also defer Windows PDB data resolution, but that requires some nontrivial changes in the runtime.

@tommcdon
Copy link
Member

Hi @nike4613! Thank you for the community PR! I believe there is at least one comment that should be addressed to help move this PR forward. Do you plan on working on it in the near term? If not, would you be interested in moving this PR to Draft?

@jkotas jkotas added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 19, 2025
@nike4613
Copy link
Author

@tommcdon Yes, I should have some stuff today even.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 19, 2025
@nike4613 nike4613 force-pushed the stacktrace-defer-debuginfo branch from c593348 to f23f1fe Compare May 19, 2025 21:29
jkotas added a commit to jkotas/runtime that referenced this pull request May 19, 2025
vzeroupper is AVX instruction and so it cannot be executed unconditionally in static asm helpers

Fixes dotnet#115019
jkotas added a commit to jkotas/runtime that referenced this pull request May 19, 2025
vzeroupper is AVX instruction and so it cannot be executed unconditionally in static asm helpers

Fixes dotnet#115019
@steveisok steveisok requested a review from lateralusX May 19, 2025 22:36
@nike4613
Copy link
Author

The remaining test failures appear to be unrelated to this PR; at least the OSX failure seems spurious. Not sure about the WASM failure.


private StackFrame[] GetFramesCore()
{
return _stackFrames ?? [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should NativeAOT get the same optimization? We do not want NativeAOT performance to be lagging behind.

Copy link
Author

@nike4613 nike4613 May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if debug info resolution is as significant on NAOT; I haven't benched it. My driving scenario is JIT-only for entirely separate reasons anyway. It shouldn't be too hard to do there, though I'd want to profile it first.

@nike4613
Copy link
Author

What needs to happen to push this forward?

rgiILOffset![index], out rgFilename![index], out rgiLineNumber![index], out rgiColumnNumber![index]);

// Make sure we mark down that debug information for this frame was resolved
rgiMethodToken[index] = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be thread safe? Other parts of the change use Interlocked.CompareExchange that suggests this is expected to be thread safe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely. I think it should be sufficient to outline the out parameters to be assigned after the call returns, and make the read/write to rgiMethodToken[index] volatile? The data race will only see multiple threads executing here, and both should result in the same values, so I don't think anything more is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, something like this should work

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be done now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Diagnostics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants