-
Notifications
You must be signed in to change notification settings - Fork 440
[DESIGN] rule streaming #5251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DESIGN] rule streaming #5251
Conversation
Signed-off-by: Jeremie Dimino <[email protected]>
The idea is that when we produce rules, we will producing rules under | ||
a current active "mask" that tells us where we are allowed to generate | ||
files or directories. Trying to produce a rule with targets not | ||
matched by this mask will be a runtime error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we already have schemes for a similar use case? Would it be possible to unify the two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Scheme is more the implementation of such a thing, i.e. it is what we would use inside Load_rules
. It is not the API that dune_rules
will use. So Scheme
would move to dune_engine
and we would stop using it in dune_rules
.
Interpreting a `library` stanza requires knowing the set of `.ml` | ||
files in the current directrory. Knowing this requires interpreting | ||
`copy_files` in the current directory. So the interpretation of | ||
`library` stanzas will need to go under a `narrow` as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you a lot for the write up; mask are very similar (on purpose I guess) with the summary of the scoped_dir
we discussed. Summary would given by the user in the scoped_dir stanza and would allow dune to tell when to go inside the scoped_dir
, so I propose to add a futur extension to the mask.
`library` stanzas will need to go under a `narrow` as well. | |
`library` stanzas will need to go under a `narrow` as well. | |
### Extended mask (V2) | |
Stanzas produces rules that produce targets, but they also produce public binaries, libraries and even extend aliases, and list of files to install in a package. The the computation loading generated dune files, could be suspended on the generated public binary names, library names public or not, aliases and package names using an extended mask. | |
So in the same way that the target database is populated lazily and in the right order using the mask and suspension, the public binaries database, libraries database, aliases database and package database would be populated lazily. | |
val narrow_library : Library_mask.t -> unit Memo.Build.t -> unit Memo.Build.t | |
val library_mask : ?sublib:bool -> Library_name.t -> Library_mask.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. However, do you think this should live in dune_engine
? Currently dune_engine
doesn't know anything about libraries or binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be nice for a V2 is for the Mask.t
to be extensible, and to have helpers to build lazy databases. Those would leave in the dune_engine
, and the specific masks in dune_rules
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. But do you mind putting it in a separate file? I feel like the current doc is complicated enough and I still haven't covered some topics such as stale artifacts deletion.
The production of rules definitely needs to be better defined and more strict. While doing some preliminary cleanup, I found:
|
|
||
### `Dune_rules.Gen_rules` | ||
|
||
The entry of `Dune_rules.Gen_rules` is the `gen_rules` function. It's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entry of `Dune_rules.Gen_rules` is the `gen_rules` function. It's | |
The entry point of `Dune_rules.Gen_rules` is the `gen_rules` function. It's |
filters out the result. But for things to behave well, the unwritten | ||
following invariant must hold: `gen_rules ~dir:d` is allowed to | ||
generate rules for directory `d'` iff `gen_rules ~dir:d'` emits a call | ||
to `Load_rules.load_dir ~dir:d`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point of reading I thought that the current behaviour and invariants are pretty crazy in terms of complexity :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I've been tightening the the rule production API since last night and found a lot of odd stuff:
- in many places we just keep generating the same rules again and again
- in some places we produce rules that will never be looked at
I've added checks and I'm almost done fixing the various places where we do these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, just looked at the PRs -- nice clean up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just put the last one up: #5270
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
rules when `gen_rules` is called with direcotry | ||
`_build/default/src/.foo.objs/byte`, however that would spread out the | ||
logic for interpreting `library` stanzas. It is much simpler to | ||
produce all the build rules corresponding to a `library` stanza in one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is much simpler to produce all the build rules corresponding to a
library
stanza in one go.
Reading about all the complexity that this leads to, I'm not convinced that it's the optimal design. Maybe splitting up rules for lilbrary
is in fact going to be simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried splitting the library rules this way a few times with poor results. This problem also prevents us from adding features like targets for inferred mli's. In general, it really is an annoying limitation that forces to "invert" the logic of how we want to write our rules.
@jeremiedimino This is a very helpful document! I now understand much better what's going on and why. I'm not convinced that the proposed design is the right way to go in the long term but it does seem to solve our short-term problem. I feel like the proposed world is still too complex to survive for long without trouble, so I'd like us to find a simpler, more long-lived solution. But if we can't, then let's go with this idea. |
Co-authored-by: Andrey Mokhov <[email protected]>
Co-authored-by: Andrey Mokhov <[email protected]>
No problem, this write up was useful for me as well. I've done a series of clean up and after #5270 we will already be in a much better place. I'll update the document to reflect the new current design. |
I agree. We should keep things simple. There is no need to bake in multi-dir rule production in the system. Instead, we can do it via separate utilities. Here is much simpler proposal:
Then if we want to produce rules for a subtree somewhere, say for the obj dir of a library, we can generate all the rules at once in the directory where the library is defined, store the set of rules for the obj dir in a produced value, and load this value in the directories of the sub-tree. This way we are just composing simple features that are easy to reason about and there are no surprises. |
Although, it's still not that different from what I propposed before. Maybe we simply haven't pushed far enough the "pull-based" model. Thinking about it again, when we generate the obj dir for a library we have several sub-directories that each correspond to a different stage (bytecode, native, dep graph). So we could think of matching on the sub-directory as matching on the stage. i.e. it's like implementing the logic for building a library as: let build_lib stage =
match stage with
| Byte -> ...
| Native -> ...
| Dep_graph -> ... rather than: let build_lib =
build_byte ();
build_native ();
build_dep_graph () |
To push even further, the different stages seems like something that can be part of the mask.
|
If we go in the pull-based direction, then we would forget about masks and narrowing. To be more precise, let's formalise things a bit. Let's consider the internal computation DAG of Dune, together with an The design I proposed in this PR is what I would call "push-based". In this model, the DAG is represented as follow: type node =
| Node of (value * node list) Memo.Lazy.t then nodes are referred to via a path, and let rec eval (Node n) path =
let* v, deps = Memo.Lazy.force n in
match path with
| [] -> Memo.Build.return v
| x :: path -> eval (List.nth deps x) path Then the rule provider explicitly construct this DAG. I call it "push-based" because it feels like going forward:
In what I would call "pull-based", the rule provider doesn't explicitly construct the DAG and instead provide the "eval" function directly, which is memoised.
FTR, Memo is optimised for pull-based. |
Merged the proposal document into the repo |
I had a serious look at implementing this and. I don't have a PR ready yet, but I can report on my experience so far. The proposal has a couple of disadvantages that haven't yet been discussed:
mask (prefix "a") (fun () -> (* create a rule that generates b *)) >>>
mask (prefix "b") (fun () -> (* create a rule that generates b *)) The rules under mask
val load : dir:Path.t -> Loaded.t Memo.t It would have to become: val load : dir:Path.t -> Mask.t -> Loaded.t Memo.t This means that we'll either have to memoize for every loaded mask or drop this memoization step completely. Memoizing this function with arbitrary masks might become quite expensive as there's no limit as to how many masks would be used, and there's no good way to share computation between overlapping masks. The counter arguments to the issue above are both that you only pay these costs if you're using this feature. When you don't use this feature, you are effectively generating everything under the mask I'll also mention some other notable things regarding the implementation.
val is_empty : mask -> bool
val inter : mask -> mask -> mask This is because when we're loading under a mask, we need to keep narrowing it as we descend down the user provided callback. And once we've hit an empty mask, we aren't interested in loading rules further. The following constraints rule out a natural implementation for the mask such as
|
Rendered version here.
This PR adds a design doc for the rule streaming feature I'm thinking about. /cc @snowleopard @rgrinberg @bobot. I still need to cover stale artifact deletion and special directories such as
.ppx
,_odoc
, ... but the main idea is there.