Skip to content

cmd/go: build dependent packages as soon as export data is ready #15734

@josharian

Description

@josharian

This is a trace of the activity on an 8 core machine running 'go build -a std':

trace_build_std

For those who want to explore more, here is an html version. (Hint: Use a, d, w, and s keys to navigate.)

There are a few early bottlenecks (runtime, reflect, fmt) and a long near linear section at the end (net, crypto/x509, crypto/tls, net/http). Critical path scheduling (#8893) could help some with this, as could scheduling cgo invocations earlier (#15681). This issue is to discuss another proposal that complements those.

We currently wait until a package is finished building before building packages that depend on it. However, dependent packages only need export data, not machine code, to start building. I believe that that could be available once we're done with escape analysis and closure transformation, and before we run walk.

For the bottlenecks listed above:

package time until export data available total compilation time
runtime 226ms 1300ms
reflect 174ms 960ms
fmt 33ms 229ms
net 114ms 846ms
crypto/x509 66ms 253ms
crypto/tls 82ms 461ms
net/http 168ms 1310ms

Though slightly optimistic (writing export data isn't instantaneous), this does suggest that this would in general significantly reduce time spent waiting for dependencies to compile.

This pattern of large, slow, linear dependency chains also shows up in bigger projects, like juju.

@rsc implemented one enabling piece by adding a flag to emit export data separately from machine code.

Remaining work to implement this, and open questions:

  • Emitting export data before walk means that inlined functions would get walked and expanded at use rather than at initial package compilation. Does this matter? If so, an alternative is to change the compiler structure to walk all functions and then compile all functions. Would this increase high water memory mark?
  • How would the compiler signal to cmd/go that it is done emitting export data? I don't know of a clean, simple, portable cross-process semaphore.
  • This would be a pretty major upheaval in how cmd/go schedules builds. Making this work more fine-grained would be useful anyway, but it'd be a lot of high risk change.

Given the scope of the change, I'm marking this as a proposal. I'd love feedback.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions