The stage0, mes-m2 and mescc-tools projects combined would make it possible to compile GCC starting from a minimal trusted binary less than 1 KB in size. This project represents a promising alternative approach based on Ben Lynn's compiler for a subset of Haskell. The entire process starts with a single C file which reads ION assembly, then a few trusted compilers (which could be compiled manually by a motivated person), until the first self-hosting version is reached.
Then a successive chain of compilers ends with a dialect of Haskell 98. We can take advantage of this to complete the bootstrap to GCC. The goals in mind are to complete the bootstrap in as little effort as possible, while still maintaining correctness and readability.
The entire build process has been formalized in Nix and continuously checked with GitHub Actions, showing that only stage0-posix is needed.
To build blynn-compiler with Nix without installing anything:
$ nix-build -I nixpkgs=https://github.com/OriansJ/blynn-compiler/archive/master.tar.gz '<nixpkgs>' -A blynn-compiler
To build it from the Git repository:
$ nix-build -A blynn-compiler # if using Nix
$ make # otherwise
Note that the Makefile uses the system C compiler instead of the minimal seed.
The bootstrapping project is alive and healthy, come chat with us over
at #bootstrappable
on libera.chat If you know any of C, assembly,
Scheme, or Haskell, or develop on any OS on any instruction set
(the more diverse the bootstrap is, the better), we'd love to hear
your ideas and criticism.
Since Ben Lynn's compiler already bootstraps a large subset of Haskell (layout parsing, monadic I/O, typeclasses, etc.), it would not be difficult to write a Scheme interpreter (see r5rs-denot for a semantics-conforming R5RS interpreter). This Scheme interpreter would accept Scheme code from stdin and interpret it.
- Heavy work of implementing many of Haskell's features has been done already.
- Can use GHC to aid development of the interpreter (Ben Lynn's Haskell dialect with a short preamble can be read by GHC), especially with testing, type checking and linting.
- Easier verification of interpreter since written in standard Haskell and should be based on denotational semantics.
- Bootstrapping is slower, more resource intensive intermediate phases, compared to an alternative approach below.
- Fully understanding and verifying all the Haskell passes requires knowledge of parser combinators, type inference, semantic bracket abstraction, among others. (minimizing the diff between stages helps here)
If performance becomes a problem, it may be necessary to compile rather than interpret Scheme.
- More precise bootstrapping path, i.e. implementing type inference, parsing fixity declarations are not needed.
- Scheme compiler could be more efficient than interpretation.
- More work to do, and since the intermediate languages are custom and typeless we cannot really reuse existing tooling elsewhere to aid us.
- Requires compilation of a call-by-value language into one that is call-by-need, effect on time and space usage unknown.
It is recommended to have good knowledge of functional languages, their theory and implementation. For more specialized topics the relevant resources are shown below.
- Applicative programming with
effects, Monadic
parser
combinators
- After singularity, every compiler uses applicative parser combinators.
- Typing Haskell in
Haskell
- The type checking and inference code is taken from this paper.
- Tackling the Awkward Squad: monadic input/output, concurrency,
exceptions, and foreign-language calls in
Haskell
- The effectively compiler adds IO and FFI. This paper gives relevant background on how it is implemented.
- Scott
encoding
- Turns out compiling algebraic data types and pattern matching isn't hard at all, if you know how to rewrite it to lambda calculus. Scott encoding is one such method.
- Lambda to SKI,
Semantically
- Without the results from this paper, compiling to combinators would be far more inefficient. The paper uses a semantic translation rather than the usual syntactic translation, thus making compositionality and correctness-preservation more self-evident.
This work would not have been possible without the collaboration of multiple people across different areas of expertise, we would like to especially thank Ben Lynn for writing blynn-compiler and the bootstrappable developers for writing the assemblers and compilers, and for making blynn-compiler M2-Planet compatible.