Skip to content

gccgo: seg fault if profiling signal arrives when collecting backtrace #29448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cherrymui opened this issue Dec 29, 2018 · 4 comments
Closed
Milestone

Comments

@cherrymui
Copy link
Member

What version of Go are you using (go version)?

gccgo (GCC) 9.0.0 20181224 (experimental)

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

package main

import (
	"log"
	"os"
	"runtime"
	"runtime/pprof"
)

const N = 10000

func main() {
	f, err := os.Create("/tmp/prof")
	if err != nil {
		log.Fatal(err)
	}

	pprof.StartCPUProfile(f)

	go func(){
		select {}
	}()

	buf := make([]byte, 99999)
	for i := 0; i < N; i++ {
		runtime.Stack(buf, true)
	}

	pprof.StopCPUProfile()
	f.Close()
}

This program, compiled with gccgo, crashes with high probability. Here is the stack trace at crash:

(gdb) bt
#0  0x0000000000544562 in runtime.sigtramp (sig=27, info=0x7ffff7e88bf0, context=0x7ffff7e88ac0) at /usr/local/google/home/cherryyz/src/gccgo/libgo/runtime/go-signal.c:94
#1  <signal handler called>
#2  0x00007ffff7342b3c in setcontext () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x000000000040706b in runtime.gogo (newg=newg@entry=0xc0001a4a80) at /usr/local/google/home/cherryyz/src/gccgo/libgo/runtime/proc.c:292
#4  0x00000000004070f2 in runtime.getTraceback (me=me@entry=0xc000000a80, gp=gp@entry=0xc0001a4a80) at /usr/local/google/home/cherryyz/src/gccgo/libgo/runtime/proc.c:451
#5  0x000000000048feda in runtime.tracebackothers (me=me@entry=0xc000000a80) at /usr/local/google/home/cherryyz/src/gccgo/libgo/go/runtime/traceback_gccgo.go:229
#6  0x00000000004aee00 in runtime.Stack (buf=..., all=<optimized out>) at /usr/local/google/home/cherryyz/src/gccgo/libgo/go/runtime/mprof.go:812
#7  0x0000000000405bab in main.main () at p.go:26

(gdb) p gp
$1 = (G *) 0xc0001a4a80
(gdb) p gp->m
$2 = (struct m *) 0x0

The signal arrives half-way through a goroutine switch, where gp is not nil but gp->m is nil. Function getTraceback switches to gp without setting up an m. When setcontext finishes, it will be in mcall, which will call gtraceback, which will set up an m. But there is a gap and profiling signal could arrive in between. Maybe we could set up an m earlier, in getTraceback, before switching to gp.

Another possibility would be in sigtramp, treating "gp is not nil but gp->m is nil" the same as gp is nil.

Found this initially when investigating a similar seg fault with precise stack scan (with gollvm), where it uses a similar way to switch goroutine when scanning another goroutine's stack.

cc @ianlancetaylor @thanm

@gopherbot gopherbot added this to the Gccgo milestone Dec 29, 2018
@cherrymui cherrymui self-assigned this Dec 31, 2018
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/156037 mentions this issue: runtime: prevent deadlock when profiling signal arrives during traceback

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/156038 mentions this issue: runtime: in getTraceback, set gp->m before gogo

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/156018 mentions this issue: runtime/pprof: add a test for gccgo bug #29448

gopherbot pushed a commit to golang/gofrontend that referenced this issue Jan 5, 2019
Traceback routines, e.g. callers and funcentry, may call
__go_get_backtrace_state. If a profiling signal arrives while we
are in the critical section of __go_get_backtrace_state, it tries
to do a traceback, which also calls __go_get_backtrace_state,
which tries to enter the same critical section and will deadlock.
Prevent this deadlock by setting up runtime_in_callers before
calling __go_get_backtrace_state.

Found while investigating golang/go#29448. Will add a test in the
next CL.

Updates golang/go#29448.

Change-Id: Ifb50c50aedc6faf7259757b82e11b30b867bc70d
Reviewed-on: https://go-review.googlesource.com/c/156037
Reviewed-by: Ian Lance Taylor <[email protected]>
kraj pushed a commit to kraj/gcc that referenced this issue Jan 5, 2019
…back

    
    Traceback routines, e.g. callers and funcentry, may call
    __go_get_backtrace_state. If a profiling signal arrives while we
    are in the critical section of __go_get_backtrace_state, it tries
    to do a traceback, which also calls __go_get_backtrace_state,
    which tries to enter the same critical section and will deadlock.
    Prevent this deadlock by setting up runtime_in_callers before
    calling __go_get_backtrace_state.
    
    Found while investigating golang/go#29448. Will add a test in the
    next CL.
    
    Updates golang/go#29448.
    
    Reviewed-on: https://go-review.googlesource.com/c/156037


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267590 138bc75d-0d04-0410-961f-82ee72b054a4
gopherbot pushed a commit that referenced this issue Jan 7, 2019
With gccgo, if a profiling signal arrives in certain time during
traceback, it may crash or hang. The fix is CL 156037 and
CL 156038.  This CL adds a test.

Updates #29448.

Change-Id: Idb36af176b4865b8fb31a85cad185ed4c07ade0c
Reviewed-on: https://go-review.googlesource.com/c/156018
Reviewed-by: Ian Lance Taylor <[email protected]>
Reviewed-by: Brad Fitzpatrick <[email protected]>
Run-TryBot: Ian Lance Taylor <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/156697 mentions this issue: runtime: in doscanstackswitch, set gp->m before gogo

gopherbot pushed a commit to golang/gofrontend that referenced this issue Jan 7, 2019
This is following CL 156038. doscanstackswitch uses the same
mechanism of switching goroutines as getTraceback, and so has
the same problem as described in issue golang/go#29448. This CL
applies the same fix.

Change-Id: I0da9996b0f73755f0171c7c10be6c77cb298b517
Reviewed-on: https://go-review.googlesource.com/c/156697
Reviewed-by: Ian Lance Taylor <[email protected]>
kraj pushed a commit to kraj/gcc that referenced this issue Jan 8, 2019
    
    Currently, when collecting a traceback for another goroutine,
    getTraceback calls gogo(gp) switching to gp, which will resume in
    mcall, which will call gtraceback, which will set up gp->m. There
    is a gap between setting the current running g to gp and setting
    gp->m. If a profiling signal arrives in between, sigtramp will
    see a non-nil gp with a nil m, and will seg fault. Fix this by
    setting up gp->m first.
    
    Fixes golang/go#29448.
    
    Reviewed-on: https://go-review.googlesource.com/c/156038


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267658 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this issue Jan 8, 2019
    
    This is following CL 156038. doscanstackswitch uses the same
    mechanism of switching goroutines as getTraceback, and so has
    the same problem as described in issue golang/go#29448. This CL
    applies the same fix.
    
    Reviewed-on: https://go-review.googlesource.com/c/156697


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267661 138bc75d-0d04-0410-961f-82ee72b054a4
@golang golang locked and limited conversation to collaborators Jan 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants