Skip to content

Commit 1ba96d8

Browse files
committed
cmd/compile: implement jump tables
Performance is kind of hard to exactly quantify. One big difference between jump tables and the old binary search scheme is that there's only 1 branch statement instead of O(n) of them. That can be both a blessing and a curse, and can make evaluating jump tables very hard to do. The single branch can become a choke point for the hardware branch predictor. A branch table jump must fit all of its state in a single branch predictor entry (technically, a branch target predictor entry). With binary search that predictor state can be spread among lots of entries. In cases where the case selection is repetitive and thus predictable, binary search can perform better. The big win for a jump table is that it doesn't consume so much of the branch predictor's resources. But that benefit is essentially never observed in microbenchmarks, because the branch predictor can easily keep state for all the binary search branches in a microbenchmark. So that benefit is really hard to measure. So predictable switch microbenchmarks are ~useless - they will almost always favor the binary search scheme. Fully unpredictable switch microbenchmarks are better, as they aren't lying to us quite so much. In a perfectly unpredictable situation, a jump table will expect to incur 1-1/N branch mispredicts, where a binary search would incur lg(N)/2 of them. That makes the crossover point at about N=4. But of course switches in real programs are seldom fully unpredictable, so we'll use a higher crossover point. Beyond the branch predictor, jump tables tend to execute more instructions per switch but have no additional instructions per case, which also argues for a larger crossover. As far as code size goes, with this CL cmd/go has a slightly smaller code segment and a slightly larger overall size (from the jump tables themselves which live in the data segment). This is a case where some FDO (feedback-directed optimization) would be really nice to have. #28262 Some large-program benchmarks might help make the case for this CL. Especially if we can turn on branch mispredict counters so we can see how much using jump tables can free up branch prediction resources that can be gainfully used elsewhere in the program. name old time/op new time/op delta Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10) Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9) Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9) Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10) Fixes #5496 Update #34381 Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f Reviewed-on: https://go-review.googlesource.com/c/go/+/357330 Run-TryBot: Keith Randall <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Cherry Mui <[email protected]> Reviewed-by: Keith Randall <[email protected]>
1 parent dd97871 commit 1ba96d8

23 files changed

+428
-40
lines changed

src/cmd/compile/internal/amd64/ssa.go

+10
Original file line numberDiff line numberDiff line change
@@ -1400,6 +1400,16 @@ func ssaGenBlock(s *ssagen.State, b, next *ssa.Block) {
14001400
}
14011401
}
14021402

1403+
case ssa.BlockAMD64JUMPTABLE:
1404+
// JMP *(TABLE)(INDEX*8)
1405+
p := s.Prog(obj.AJMP)
1406+
p.To.Type = obj.TYPE_MEM
1407+
p.To.Reg = b.Controls[1].Reg()
1408+
p.To.Index = b.Controls[0].Reg()
1409+
p.To.Scale = 8
1410+
// Save jump tables for later resolution of the target blocks.
1411+
s.JumpTables = append(s.JumpTables, b)
1412+
14031413
default:
14041414
b.Fatalf("branch not implemented: %s", b.LongString())
14051415
}

src/cmd/compile/internal/gc/obj.go

+3
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,9 @@ func addGCLocals() {
271271
objw.Global(x, int32(len(x.P)), obj.RODATA|obj.DUPOK)
272272
x.Set(obj.AttrStatic, true)
273273
}
274+
for _, jt := range fn.JumpTables {
275+
objw.Global(jt.Sym, int32(len(jt.Targets)*base.Ctxt.Arch.PtrSize), obj.RODATA)
276+
}
274277
}
275278
}
276279

src/cmd/compile/internal/ir/node.go

+1
Original file line numberDiff line numberDiff line change
@@ -310,6 +310,7 @@ const (
310310
ORESULT // result of a function call; Xoffset is stack offset
311311
OINLMARK // start of an inlined body, with file/line of caller. Xoffset is an index into the inline tree.
312312
OLINKSYMOFFSET // offset within a name
313+
OJUMPTABLE // A jump table structure for implementing dense expression switches
313314

314315
// opcodes for generics
315316
ODYNAMICDOTTYPE // x = i.(T) where T is a type parameter (or derived from a type parameter)

src/cmd/compile/internal/ir/node_gen.go

+22
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/cmd/compile/internal/ir/op_string.go

+11-10
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/cmd/compile/internal/ir/stmt.go

+32
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import (
88
"cmd/compile/internal/base"
99
"cmd/compile/internal/types"
1010
"cmd/internal/src"
11+
"go/constant"
1112
)
1213

1314
// A Decl is a declaration of a const, type, or var. (A declared func is a Func.)
@@ -262,6 +263,37 @@ func NewIfStmt(pos src.XPos, cond Node, body, els []Node) *IfStmt {
262263
return n
263264
}
264265

266+
// A JumpTableStmt is used to implement switches. Its semantics are:
267+
// tmp := jt.Idx
268+
// if tmp == Cases[0] goto Targets[0]
269+
// if tmp == Cases[1] goto Targets[1]
270+
// ...
271+
// if tmp == Cases[n] goto Targets[n]
272+
// Note that a JumpTableStmt is more like a multiway-goto than
273+
// a multiway-if. In particular, the case bodies are just
274+
// labels to jump to, not not full Nodes lists.
275+
type JumpTableStmt struct {
276+
miniStmt
277+
278+
// Value used to index the jump table.
279+
// We support only integer types that
280+
// are at most the size of a uintptr.
281+
Idx Node
282+
283+
// If Idx is equal to Cases[i], jump to Targets[i].
284+
// Cases entries must be distinct and in increasing order.
285+
// The length of Cases and Targets must be equal.
286+
Cases []constant.Value
287+
Targets []*types.Sym
288+
}
289+
290+
func NewJumpTableStmt(pos src.XPos, idx Node) *JumpTableStmt {
291+
n := &JumpTableStmt{Idx: idx}
292+
n.pos = pos
293+
n.op = OJUMPTABLE
294+
return n
295+
}
296+
265297
// An InlineMarkStmt is a marker placed just before an inlined body.
266298
type InlineMarkStmt struct {
267299
miniStmt

src/cmd/compile/internal/ssa/check.go

+4
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,10 @@ func checkFunc(f *Func) {
100100
if b.NumControls() != 0 {
101101
f.Fatalf("plain/dead block %s has a control value", b)
102102
}
103+
case BlockJumpTable:
104+
if b.NumControls() != 1 {
105+
f.Fatalf("jumpTable block %s has no control value", b)
106+
}
103107
}
104108
if len(b.Succs) != 2 && b.Likely != BranchUnknown {
105109
f.Fatalf("likeliness prediction %d for block %s with %d successors", b.Likely, b, len(b.Succs))

src/cmd/compile/internal/ssa/config.go

+3
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,9 @@ type Frontend interface {
168168

169169
// MyImportPath provides the import name (roughly, the package) for the function being compiled.
170170
MyImportPath() string
171+
172+
// LSym returns the linker symbol of the function being compiled.
173+
LSym() string
171174
}
172175

173176
// NewConfig returns a new configuration object for the given architecture.

src/cmd/compile/internal/ssa/export_test.go

+3
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@ func (d TestFrontend) Debug_checknil() bool { retu
102102
func (d TestFrontend) MyImportPath() string {
103103
return "my/import/path"
104104
}
105+
func (d TestFrontend) LSym() string {
106+
return "my/import/path.function"
107+
}
105108

106109
var testTypes Types
107110

src/cmd/compile/internal/ssa/gen/AMD64.rules

+2
Original file line numberDiff line numberDiff line change
@@ -517,6 +517,8 @@
517517

518518
(If cond yes no) => (NE (TESTB cond cond) yes no)
519519

520+
(JumpTable idx) => (JUMPTABLE {makeJumpTableSym(b)} idx (LEAQ <typ.Uintptr> {makeJumpTableSym(b)} (SB)))
521+
520522
// Atomic loads. Other than preserving their ordering with respect to other loads, nothing special here.
521523
(AtomicLoad8 ptr mem) => (MOVBatomicload ptr mem)
522524
(AtomicLoad32 ptr mem) => (MOVLatomicload ptr mem)

src/cmd/compile/internal/ssa/gen/AMD64Ops.go

+6
Original file line numberDiff line numberDiff line change
@@ -1001,6 +1001,12 @@ func init() {
10011001
{name: "NEF", controls: 1},
10021002
{name: "ORD", controls: 1}, // FP, ordered comparison (parity zero)
10031003
{name: "NAN", controls: 1}, // FP, unordered comparison (parity one)
1004+
1005+
// JUMPTABLE implements jump tables.
1006+
// Aux is the symbol (an *obj.LSym) for the jump table.
1007+
// control[0] is the index into the jump table.
1008+
// control[1] is the address of the jump table (the address of the symbol stored in Aux).
1009+
{name: "JUMPTABLE", controls: 2, aux: "Sym"},
10041010
}
10051011

10061012
archs = append(archs, arch{

src/cmd/compile/internal/ssa/gen/genericOps.go

+7-6
Original file line numberDiff line numberDiff line change
@@ -639,12 +639,13 @@ var genericOps = []opData{
639639
// First [] [always, never]
640640

641641
var genericBlocks = []blockData{
642-
{name: "Plain"}, // a single successor
643-
{name: "If", controls: 1}, // if Controls[0] goto Succs[0] else goto Succs[1]
644-
{name: "Defer", controls: 1}, // Succs[0]=defer queued, Succs[1]=defer recovered. Controls[0] is call op (of memory type)
645-
{name: "Ret", controls: 1}, // no successors, Controls[0] value is memory result
646-
{name: "RetJmp", controls: 1}, // no successors, Controls[0] value is a tail call
647-
{name: "Exit", controls: 1}, // no successors, Controls[0] value generates a panic
642+
{name: "Plain"}, // a single successor
643+
{name: "If", controls: 1}, // if Controls[0] goto Succs[0] else goto Succs[1]
644+
{name: "Defer", controls: 1}, // Succs[0]=defer queued, Succs[1]=defer recovered. Controls[0] is call op (of memory type)
645+
{name: "Ret", controls: 1}, // no successors, Controls[0] value is memory result
646+
{name: "RetJmp", controls: 1}, // no successors, Controls[0] value is a tail call
647+
{name: "Exit", controls: 1}, // no successors, Controls[0] value generates a panic
648+
{name: "JumpTable", controls: 1}, // multiple successors, the integer Controls[0] selects which one
648649

649650
// transient block state used for dead code removal
650651
{name: "First"}, // 2 successors, always takes the first one (second is dead)

src/cmd/compile/internal/ssa/gen/rulegen.go

+2
Original file line numberDiff line numberDiff line change
@@ -1838,6 +1838,8 @@ func (op opData) auxIntType() string {
18381838
// auxType returns the Go type that this block should store in its aux field.
18391839
func (b blockData) auxType() string {
18401840
switch b.aux {
1841+
case "Sym":
1842+
return "Sym"
18411843
case "S390XCCMask", "S390XCCMaskInt8", "S390XCCMaskUint8":
18421844
return "s390x.CCMask"
18431845
case "S390XRotateParams":

src/cmd/compile/internal/ssa/opGen.go

+27-23
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/cmd/compile/internal/ssa/rewrite.go

+7
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
package ssa
66

77
import (
8+
"cmd/compile/internal/base"
89
"cmd/compile/internal/logopt"
910
"cmd/compile/internal/types"
1011
"cmd/internal/obj"
@@ -1954,3 +1955,9 @@ func logicFlags32(x int32) flagConstant {
19541955
fcb.N = x < 0
19551956
return fcb.encode()
19561957
}
1958+
1959+
func makeJumpTableSym(b *Block) *obj.LSym {
1960+
s := base.Ctxt.Lookup(fmt.Sprintf("%s.jump%d", b.Func.fe.LSym(), b.ID))
1961+
s.Set(obj.AttrDuplicateOK, true)
1962+
return s
1963+
}

src/cmd/compile/internal/ssa/rewriteAMD64.go

+14
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)