Skip to content

runtime/pprof: goroutine profile at debug=2 hides the start of the finalizer goroutine #73011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gavraz opened this issue Mar 23, 2025 · 8 comments
Assignees
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@gavraz
Copy link

gavraz commented Mar 23, 2025

Go version

1.23

Output of go env in your module/workspace:

GO111MODULE='on'
GOARCH='arm64'
GOBIN='/usr/local/bin/'
GOCACHE='/Users/user/Library/Caches/go-build'
GOENV='/Users/user/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/user/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/user/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/user/go/pkg/mod/golang.org/[email protected]'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/user/go/pkg/mod/golang.org/[email protected]/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.7'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/user/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/user/go/projects/github/cyolosecurity/cyolo-core/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/sk/l0sxrtq14h9ck0kv43dns5_h0000gn/T/go-build346811353=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Following this discussion I run a program with a blocking finalizer and compared the output of the goroutine profile (debug=1, debug=2).

package main

import (
	"fmt"
	"net/http"
	_ "net/http/pprof"
	"runtime"
	"sync"
	"time"
)

func main() {
	go func() {
		addr := fmt.Sprintf("0.0.0.0:%d", 6060)
		fmt.Printf("Starting PPROF: http://%s/debug/pprof/\n", addr)
		fmt.Printf("PPROF stopped: %s\n", http.ListenAndServe(addr, nil).Error())
	}()

	fmt.Println("Blocking finalizer forever...")
	mu := sync.Mutex{}
	mu.Lock()
	x := new([16]byte)
	runtime.SetFinalizer(x, func(_ *[16]byte) {
		mu.Lock()
	})
	runtime.KeepAlive(x)
	x = nil
	runtime.GC()

	for {
		time.Sleep(time.Second)
	}
}

What did you see happen?

debug=1

1 @ 0x102ed7c58 0x102eb5728 0x102eb5705 0x102ed90b8 0x102ef4744 0x1030af354 0x1030af301 0x102e7ce94 0x102edffc4
#	0x102ed90b7	sync.runtime_SemacquireMutex+0x27	/Users/user/go/pkg/mod/golang.org/[email protected]/src/runtime/sema.go:95
#	0x102ef4743	sync.(*Mutex).lockSlow+0x173		/Users/user/go/pkg/mod/golang.org/[email protected]/src/sync/mutex.go:173
#	0x1030af353	sync.(*Mutex).Lock+0x73			/Users/user/go/pkg/mod/golang.org/[email protected]/src/sync/mutex.go:92
#	0x1030af300	main.main.func2+0x20			/Users/user/Library/Application Support/JetBrains/GoLand2024.3/scratches/scratch_71.go:24
#	0x102e7ce93	runtime.runfinq+0x3e3			/Users/user/go/pkg/mod/golang.org/[email protected]/src/runtime/mfinal.go:255

debug=2

goroutine 33 [sync.Mutex.Lock]:
sync.runtime_SemacquireMutex(0x0?, 0x0?, 0x0?)
	/Users/user/go/pkg/mod/golang.org/[email protected]/src/runtime/sema.go:95 +0x28
sync.(*Mutex).lockSlow(0x14000114108)
	/Users/user/go/pkg/mod/golang.org/[email protected]/src/sync/mutex.go:173 +0x174
sync.(*Mutex).Lock(...)
	/Users/user/go/pkg/mod/golang.org/[email protected]/src/sync/mutex.go:92
main.main.func2(0x0?)
	/Users/user/Library/Application Support/JetBrains/GoLand2024.3/scratches/scratch_71.go:24 +0x74

What did you expect to see?

We anticipated that the debug=2 goroutine trace would include the runtime.runfinq function, showing the path to the finalizer's routine, just as it does in debug=1. The absence of this trace in debug=2 during our blocked finalizer debugging made identifying the root cause significantly more challenging as we found no occurrences of mfinal.go in the trace.

@gavraz gavraz changed the title PPROF's goroutine profile with debug=2 is not showing the full trace PPROF's goroutine profile with debug=2 shows incomplete trace Mar 23, 2025
@seankhliao seankhliao changed the title PPROF's goroutine profile with debug=2 shows incomplete trace runtime/pprof: goroutine profile at debug=2 hides the start of the finalizer goroutine Mar 23, 2025
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 23, 2025
@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Mar 23, 2025
@prattmic
Copy link
Member

Thanks, I agree this is misleading and I think it should be fixed.

Both pprof and debug=1 profiles get the stacks from runtime.pprof_goroutineProfileWithLabels, which does not post-processing of the stack at all.

debug=2 profiles use runtime.Stack(b, true), which formats traceback the same way as panics, when GOTRACEBACK=all. i.e., excluding runtime frames, generated wrapper functions, etc.

The runtime.Stack implementation is actually much less efficient than runtime.pprof_goroutineProfileWithLabels, as the former is a full STW while stacks are collected, while the latter is concurrent since https://go.dev/cl/387415. IMO, runtime.Stack should switch to a concurrent implementation as well.

But that is an aside. While that would fix the immediate problem, I think this problem affects normal panics as well and should be fixed there as well.

I think what we want to do avoid hiding a runtime frame if it is the first frame in the goroutine, as that can make the apparent first frame change across different traces of the same goroutine, which shouldn't be possible and is confusing.

cc @golang/runtime @rhysh

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 24, 2025
@prattmic prattmic self-assigned this Mar 24, 2025
@prattmic prattmic added this to the Go1.25 milestone Mar 24, 2025
@prattmic
Copy link
Member

I think this problem affects normal panics as well and should be fixed there as well.

Indeed.

package main

import (
        "runtime"
        "sync"
)

func main() {
        var mu sync.Mutex
        mu.Lock()
        b := new([16]byte)
        runtime.SetFinalizer(b, func(*[16]byte) {
                mu.Lock()
        })
        b = nil
        runtime.GC()

        panic("boo")
}
$ GOTRACEBACK=all go run .
panic: boo

goroutine 1 [running]:
main.main()
        /tmp/fintrace/main.go:18 +0xa5

goroutine 5 [sync.Mutex.Lock]:
internal/sync.runtime_SemacquireMutex(0x7ff5c99f0108?, 0x5?, 0x0?)
        /usr/lib/google-golang/src/runtime/sema.go:95 +0x25
internal/sync.(*Mutex).lockSlow(0xc0000100c0)
        /usr/lib/google-golang/src/internal/sync/mutex.go:149 +0x15d
internal/sync.(*Mutex).Lock(...)
        /usr/lib/google-golang/src/internal/sync/mutex.go:70
sync.(*Mutex).Lock(...)
        /usr/lib/google-golang/src/sync/mutex.go:46
main.main.func1(0x0?)
        /tmp/fintrace/main.go:13 +0x2d
exit status 2

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/660295 mentions this issue: runtime: always show runfinq in traceback

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/660296 mentions this issue: runtime: rename runfinq to runFinalizersAndCleanups

@rhysh
Copy link
Contributor

rhysh commented Mar 24, 2025

runtime.pprof_goroutineProfileWithLabels is both concurrent and parallel; the records are of fixed size, so it's not so hard to split the work across multiple Ps as they move goroutines across the profile's snapshot instant. It would be good for runtime.Stack to not have STW duration proportional to the number of goroutines, yes. It could be made concurrent, but I expect that the variable-size records (text in a []byte) would make parallelism tricky.

I think what we want to do avoid hiding a runtime frame if it is the first frame in the goroutine, as that can make the apparent first frame change across different traces of the same goroutine, which shouldn't be possible and is confusing.

I don't follow how the apparent first frame would change, can you say more about that?

@prattmic
Copy link
Member

I think what we want to do avoid hiding a runtime frame if it is the first frame in the goroutine, as that can make the apparent first frame change across different traces of the same goroutine, which shouldn't be possible and is confusing.

I don't follow how the apparent first frame would change, can you say more about that?

By "first frame", I meant outermost frame (which is not the terminology the traceback code itself uses, apologies).

So you may take a traceback and see

goroutine 5 [running]:
main.callee()
        /tmp/fintrace/main.go:33 +0x2d
main.finalizer1(0x0?)
        /tmp/fintrace/main.go:13 +0x2d

Then later take another traceback and see

goroutine 5 [running]:
main.callee()
        /tmp/fintrace/main.go:33 +0x2d
main.finalizer2(0x0?)
        /tmp/fintrace/main.go:23 +0x2d

The fact that main.finalizer1 is gone implies that it must have returned. But normally the outermost frame is the function started by the go keyword, so if that function returned the goroutine should have exited.

gopherbot pushed a commit that referenced this issue Mar 25, 2025
Users see this frame in tracebacks and it serves as a hint that what is
running here is a finalizer or cleanup. But runfinq is a rather dense
name. We can give it a more obvious name to help users realize what it
is.

For #73011.

Change-Id: I6a6a636ce9a493fd00d4b4c60c23f2b1c96d3568
Reviewed-on: https://go-review.googlesource.com/c/go/+/660296
Auto-Submit: Michael Pratt <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
@rhysh
Copy link
Contributor

rhysh commented Mar 26, 2025

I see, that's an argument in support of showing runfinq, not a description of a new problem/risk. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants