-
Notifications
You must be signed in to change notification settings - Fork 440
[DESIGN] rule streaming #5251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DESIGN] rule streaming #5251
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,241 @@ | ||||||||||||||||||||||||||
# Rule production | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This document describe a new design for the production of build rules | ||||||||||||||||||||||||||
in Dune. The new design aims to be more natural, easier to reason | ||||||||||||||||||||||||||
about and to make existing feature work well with newer ones such as | ||||||||||||||||||||||||||
directory targets. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
## How does rule production works? | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
### `Dune_engine.Load_rules` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The production of rules is driven by the module `Load_rules` in the | ||||||||||||||||||||||||||
`dune_engine` library. This library is the build system core of | ||||||||||||||||||||||||||
Dune. It is meant as a general purpose library for writing build | ||||||||||||||||||||||||||
systems, and the Dune software is built on top of it. In theory, | ||||||||||||||||||||||||||
`dune_engine` shouldn't know about `dune` or `dune-project` | ||||||||||||||||||||||||||
files. However, for historical reason this is not the case yet and | ||||||||||||||||||||||||||
`dune_engine` still knows some things about them. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
As we work on Dune, we expect that `dune_engine` will become more and | ||||||||||||||||||||||||||
more agnostic. Even though it is not completely agnostic, we have | ||||||||||||||||||||||||||
successfully been using it to build Jane Street code base, using the | ||||||||||||||||||||||||||
Jane Street rules on top of this core. So it's already more general | ||||||||||||||||||||||||||
than Dune iself. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
For the purpose of this design doc, we will treat `dune_engine` as a | ||||||||||||||||||||||||||
completly general library that doesn't know about `dune` files. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The main feature of `Load_dir` is the `Load_dir.load_dir` function: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
```ocaml | ||||||||||||||||||||||||||
val load_dir : dir:Path.Build.t -> Loaded.t Memo.Build.t | ||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`Loaded.t` represents a "loaded" set of rules for a particular | ||||||||||||||||||||||||||
directory. It can also be thought as a "compiled" set of rules. A | ||||||||||||||||||||||||||
`Loaded.t` contains all the rules in existence that produce targets in | ||||||||||||||||||||||||||
`dir`. For instance, given a `Loaded.t` we can figure out all the | ||||||||||||||||||||||||||
files that would be produced in `dir` if we were building everything | ||||||||||||||||||||||||||
that could be built. While `dir` is a build directory, this also | ||||||||||||||||||||||||||
includes files present in the source tree. This is because | ||||||||||||||||||||||||||
`Load_dir.load_dir` implicitely adds copy rules for all source files | ||||||||||||||||||||||||||
present in the source directory that correspond to the build directory | ||||||||||||||||||||||||||
`dir`. For instance, if `dir` is `_build/default/src`, Dune will | ||||||||||||||||||||||||||
implicit add rules to copy files in `src` to `_build/default/src`. | ||||||||||||||||||||||||||
Except for rules that have the special `promote` or `fallback` modes. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This is in fact how Dune evaluates globs during the build. Indeed, | ||||||||||||||||||||||||||
when writing `dune` files we work in an imaginary world where both the | ||||||||||||||||||||||||||
source files and the generated files are present. So when we write | ||||||||||||||||||||||||||
`(deps (glob_files *.txt))`, this `*.txt` denotes both `.txt` files | ||||||||||||||||||||||||||
that are present on disk in the source tree but also as the ones that | ||||||||||||||||||||||||||
can be generated by the build. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
In practice, to evaluate `(glob_files *.txt)` in directory `d`, Dune | ||||||||||||||||||||||||||
calls `Load_dir.load_dir ~dir:d` and filter the list of files that can | ||||||||||||||||||||||||||
be built. Similarly, when Dune needs to build a file | ||||||||||||||||||||||||||
`_build/default/src/x`, it first calls `Load_dir.load_dir` with | ||||||||||||||||||||||||||
`_build/default/src` and then looks up a rule that has `x` has | ||||||||||||||||||||||||||
target in the returned `Loaded.t`. The `Load_dir.load_dir` is | ||||||||||||||||||||||||||
memoised, so it can be called multiple times during the build without | ||||||||||||||||||||||||||
guilt. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
While `Load_dir` is responsible for driving the production of rules, | ||||||||||||||||||||||||||
it is part of `dune_engine` which doesn't know about `dune` files and | ||||||||||||||||||||||||||
doesn't know about OCaml libraries or OCaml compilation in general. So | ||||||||||||||||||||||||||
it is not responsible for actually producing the build rules that | ||||||||||||||||||||||||||
allow to build Dune projects. Instead, `Load_dir` defers the actual | ||||||||||||||||||||||||||
production of rules to a callback that it obtains via | ||||||||||||||||||||||||||
`Build_config`. Inside Dune, this callback is implemented by the | ||||||||||||||||||||||||||
`Gen_rules` module inside the `dune_rules` library. `dune_rules` is | ||||||||||||||||||||||||||
the library that is responsible for parsing, interpreting and | ||||||||||||||||||||||||||
compiling `dune` files down to low-level build rules. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
### `Dune_rules.Gen_rules` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The entry of `Dune_rules.Gen_rules` is the `gen_rules` function. It's | ||||||||||||||||||||||||||
API looks like: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
```ocaml | ||||||||||||||||||||||||||
val gen_rules : dir:Path.Build.t -> Dune_engine.Rules.t Memo.Build.t | ||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Where `Dune_rules.Rules.t` is a "raw" set of rules. Raw in the sense | ||||||||||||||||||||||||||
that there is no overlap checks or any other checks. During the rule | ||||||||||||||||||||||||||
production phase, we merely accumulate a set of rules that is later | ||||||||||||||||||||||||||
processed. The API of `gen_rules` is in fact a bit more complex, but | ||||||||||||||||||||||||||
the above definition is enough for the purpose of this document. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The first thing `gen_rules` does is analyse the directory it is | ||||||||||||||||||||||||||
given. If the directory corresponds to a source directory with a `dune` | ||||||||||||||||||||||||||
file, `gen_rules` will dispatch the call to the part of `dune_rules` | ||||||||||||||||||||||||||
that parses and interprets the `dune` file. This is the simplest case, | ||||||||||||||||||||||||||
but even in this case there are some things worth mentioning. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
For instance, when compiling an OCaml library dune stores the | ||||||||||||||||||||||||||
artifacts for the library in generated dot-directories. For instance, | ||||||||||||||||||||||||||
the cmi files for library `foo` living in source directory `src` will | ||||||||||||||||||||||||||
end up in `_build/default/src/.foo.objs/byte`. We could produce these | ||||||||||||||||||||||||||
rules when `gen_rules` is called with direcotry | ||||||||||||||||||||||||||
`_build/default/src/.foo.objs/byte`, however that would spread out the | ||||||||||||||||||||||||||
logic for interpreting `library` stanzas. It is much simpler to | ||||||||||||||||||||||||||
produce all the build rules corresponding to a `library` stanza in one | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Reading about all the complexity that this leads to, I'm not convinced that it's the optimal design. Maybe splitting up rules for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried splitting the library rules this way a few times with poor results. This problem also prevents us from adding features like targets for inferred mli's. In general, it really is an annoying limitation that forces to "invert" the logic of how we want to write our rules. |
||||||||||||||||||||||||||
go. This is what is hapenning at the moment: when called with | ||||||||||||||||||||||||||
directory `_build/default/src`, `gen_rules` will not only produce | ||||||||||||||||||||||||||
rules for this directory but will also produce rules for | ||||||||||||||||||||||||||
`_build/default/src/.foo.objs/byte` and various other directories. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`Load_rules` doesn't know anything about this. And in particular, it | ||||||||||||||||||||||||||
doesn't know that it is the `gen_rules` call for directory | ||||||||||||||||||||||||||
`_build/default/src/` that will produce the rules for the dot | ||||||||||||||||||||||||||
subdirectories. When `Load_dir` loads the rules for the | ||||||||||||||||||||||||||
`.../.foo.objs/byte` sub-directory, it simply calls `gen_rules` with | ||||||||||||||||||||||||||
this directory. It is `gen_rules` that "redirects" the call to the | ||||||||||||||||||||||||||
`_build/default/src` directory by calling | ||||||||||||||||||||||||||
`Load_rules.load_dir_and_produce_its_rules`. This function simply | ||||||||||||||||||||||||||
calls `Load_rules.load_dir` and re-emits all the raw rules that were | ||||||||||||||||||||||||||
returned by the corresponding `gen_rules` call. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This works because `Load_rules.load_dir` accepts the facts that | ||||||||||||||||||||||||||
`gen_rules` produces rules for many directory at once. It simply | ||||||||||||||||||||||||||
filters out the result. But for things to behave well, the unwritten | ||||||||||||||||||||||||||
following invariant must hold: `gen_rules ~dir:d` is allowed to | ||||||||||||||||||||||||||
generate rules for directory `d'` iff `gen_rules ~dir:d'` emits a call | ||||||||||||||||||||||||||
to `Load_rules.load_dir ~dir:d`. | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At this point of reading I thought that the current behaviour and invariants are pretty crazy in terms of complexity :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed! I've been tightening the the rule production API since last night and found a lot of odd stuff:
I've added checks and I'm almost done fixing the various places where we do these things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, just looked at the PRs -- nice clean up! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And just put the last one up: #5270 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This schenario happens in a number of cases. All these cases share a | ||||||||||||||||||||||||||
common pattern: the redirections are always to an ancestor | ||||||||||||||||||||||||||
directory. At the moment, there is one exception to this pattern in | ||||||||||||||||||||||||||
the odoc rules, however it is easy to remove. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Finally, the `copy_files` stanza creates another form of dependency | ||||||||||||||||||||||||||
between directory. In order to calculate the targets produced by | ||||||||||||||||||||||||||
`copy_files`, which needs to be known at rule production time, we need | ||||||||||||||||||||||||||
to evaluate the glob given to `copy_files`. Which requires doing a | ||||||||||||||||||||||||||
call to `Load_rules.load_dir` as previously described. Contrary to the | ||||||||||||||||||||||||||
other form of dependency we just describe, this ones can go from any | ||||||||||||||||||||||||||
directory to any other directory. For instance, the following stanza | ||||||||||||||||||||||||||
in `src/dune`: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||
(copy_files foo/*.txt) | ||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
would create a dependency from `_build/default/src` to | ||||||||||||||||||||||||||
`_build_default/src/foo`. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
So in the end, if we were looking at the internal computation graph of | ||||||||||||||||||||||||||
Dune and narrowing it to just the calls to `Load_rules.load_dir`, we | ||||||||||||||||||||||||||
would see a graph with many edges going from a directory to one of its | ||||||||||||||||||||||||||
ancestor. These would mostly be between generated dot-subdirectories | ||||||||||||||||||||||||||
and their first ancestor that has a corresponding directory in the | ||||||||||||||||||||||||||
source tree. Plus a few other arbitrary ones for each `copy_files` | ||||||||||||||||||||||||||
stanza. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
## Directory targets | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Before directory targets, answering the question "what rules produces | ||||||||||||||||||||||||||
file X?" was easy. Dune would just call `Load_rules.load_dir` and | ||||||||||||||||||||||||||
lookup `X` in the result. With directory targets, things are a bit | ||||||||||||||||||||||||||
more complicated. Indeed, `X` might also be produced by a directory | ||||||||||||||||||||||||||
target in an ancestor directory. This means that `Load_rules.load_dir` | ||||||||||||||||||||||||||
now need to look in parent directories as well, which introduce more | ||||||||||||||||||||||||||
dependencies from directories to their parents and can create cycles | ||||||||||||||||||||||||||
because of `copy_files` stanza that create dependencies in the other | ||||||||||||||||||||||||||
direction. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
At a result, some combinations of `copy_files` and directory targets | ||||||||||||||||||||||||||
don't produce the expected result. This is documented in the test | ||||||||||||||||||||||||||
suite. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
## Problem | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The above description exposes a concrete problem with directory | ||||||||||||||||||||||||||
targets, but there is also a general sense of messiness in the way | ||||||||||||||||||||||||||
things work. Generating rules for multiple directories at once is | ||||||||||||||||||||||||||
natural, but the current encoding is odd. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
## Proposal | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The proposal is to add the following rule: `gen_rules ~dir` is allowed | ||||||||||||||||||||||||||
to produced rules in `dir` or any of its descendant only. It is not | ||||||||||||||||||||||||||
allowed to produce rules anywhere else. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
`Load_rules.load_dir ~dir` will then always call itself recursively on | ||||||||||||||||||||||||||
the parent of `dir` and take the union of the rules produced by | ||||||||||||||||||||||||||
`gen_rules` for `dir` and the ones produced by the recursive | ||||||||||||||||||||||||||
call. `gen_rules` will no longer have to redirect a call via | ||||||||||||||||||||||||||
`Load_rules.load_dir_and_produce_its_rules`, which we would simply | ||||||||||||||||||||||||||
remove. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
This introduces a cycle with all `copy_files` stanza that copy files | ||||||||||||||||||||||||||
from a sub-directory. We propose the break this cycle by introducing | ||||||||||||||||||||||||||
laziness in the rule production code. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
### Generating rules with a mask | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The idea is that when we produce rules, we will produce rules under | ||||||||||||||||||||||||||
a current active "mask" that tells us where we are allowed to generate | ||||||||||||||||||||||||||
files or directories. Trying to produce a rule with targets not | ||||||||||||||||||||||||||
matched by this mask will be a runtime error. | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't we already have schemes for a similar use case? Would it be possible to unify the two? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely. Scheme is more the implementation of such a thing, i.e. it is what we would use inside |
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
When entering `gen_rules ~dir`, the initial mask will be: "any files | ||||||||||||||||||||||||||
and directories that is a descendant of directory `dir`". | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
We can then narrow the mask to a sub-mask: | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
```ocaml | ||||||||||||||||||||||||||
val narrow : Target_mask.t -> unit Memo.Build.t -> unit Memo.Build.t | ||||||||||||||||||||||||||
``` | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
With `narrow mask m`, `m` would only be allowed to produce rules whose | ||||||||||||||||||||||||||
target are matched by the intersection of `mask` and the current | ||||||||||||||||||||||||||
mask. `m` wouldn't be evaluated eagerly. Instead, `gen_rules` would | ||||||||||||||||||||||||||
now return a set of direct rules as well as a list of | ||||||||||||||||||||||||||
`(Target_mask.t * unit Memo.Build.t)`. Let's call such a pair a | ||||||||||||||||||||||||||
suspension. A suspension can be forced by evaluation its second | ||||||||||||||||||||||||||
component. Doing so will yield a list of rules matched by the mask and | ||||||||||||||||||||||||||
a new list of suspension. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
### Staged rules loading | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The next step is to stage `Load_rules.load_dir`. In addition to taking | ||||||||||||||||||||||||||
a directory, `load_dir` will now also take a mask and will return the | ||||||||||||||||||||||||||
set of rules for this mask. To do that, it might need to force a bunch of | ||||||||||||||||||||||||||
suspensions recursively. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
### How does that help? | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
We will put `copy_rules` under a `narrow <only file targets in current | ||||||||||||||||||||||||||
dir>`. In order to determine if a directory is part of a directory | ||||||||||||||||||||||||||
target in an ancestor directory, we wouldn't need to force this | ||||||||||||||||||||||||||
suspension. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
### Difficulties | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Interpreting a `library` stanza requires knowing the set of `.ml` | ||||||||||||||||||||||||||
files in the current directrory. Knowing this requires interpreting | ||||||||||||||||||||||||||
`copy_files` in the current directory. So the interpretation of | ||||||||||||||||||||||||||
`library` stanzas will need to go under a `narrow` as well. | ||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you a lot for the write up; mask are very similar (on purpose I guess) with the summary of the
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense to me. However, do you think this should live in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would be nice for a V2 is for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this idea. But do you mind putting it in a separate file? I feel like the current doc is complicated enough and I still haven't covered some topics such as stale artifacts deletion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.