Skip to content

Commit 1a1b6c9

Browse files
kchodorowmeteorcloudy
authored andcommitted
Add design doc for recursive WORKSPACE file parsing
#1943 -- Change-Id: Ic7b4378acb4d3661a1b61ff887c9cc8592df9f7e Reviewed-on: https://bazel-review.googlesource.com/c/6131/ MOS_MIGRATED_REVID=136154990
1 parent de14ade commit 1a1b6c9

File tree

4 files changed

+199
-0
lines changed

4 files changed

+199
-0
lines changed

site/assets/ws-diamond.png

Loading

site/assets/ws-line.png

Loading

site/assets/ws-multiline.png

Loading
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
layout: contribute
3+
title: Recursive WORKSPACE file parsing
4+
---
5+
# Recursive WORKSPACE file parsing
6+
7+
**Status**: Unimplemented
8+
9+
**Author**: [kchodorow@](mailto:[email protected])
10+
11+
## Objective
12+
13+
Users are annoyed by having to specify all deps in their WORKSPACE file. To
14+
avoid this inconvenience, Bazel could load subprojects' WORKSPACE files
15+
automatically.
16+
17+
## Non-Goals
18+
19+
* Solve the problem of people specifying two different repositories with the
20+
same name (e.g., @util).
21+
* Solve the problem of people specifying two different names for the same
22+
repository (`@guava` and `@com_google_guava`).
23+
24+
## Resolution
25+
26+
When a repository is defined in multiple files, which definition "wins"? What
27+
causes a conflict/error?
28+
29+
### Defined in the main repository's WORKSPACE file
30+
31+
This definition wins, regardless of other definitions.
32+
33+
### In a line
34+
35+
Note: as an intermediate step, this can be disabled, but the end goal is to
36+
allow this so that intermediate dependencies that the top level doesn't care
37+
about don't need to be resolved.
38+
39+
Suppose we have a main repository that depends on repo x, and x depends on repo
40+
y:
41+
42+
<img src="/assets/ws-line.jpg" class="img-responsive">
43+
44+
In this case, version 1 of "foo" wins. This way, if a library has already
45+
figured out which version works for them, its reverse dependencies do not have
46+
to think about it.
47+
48+
This will also work if a parent overrides its children's versions, even if it
49+
has multiple children.
50+
51+
### Different lines
52+
53+
If there is no obvious hierarchy and multiple versions are specified, error out.
54+
Report what each chain of dependencies was that wanted the dep and at which
55+
versions:
56+
57+
<img src="/assets/ws-multiline.jpg" class="img-responsive">
58+
59+
In this case, Bazel would error out with:
60+
61+
```
62+
ERROR: Conflicting definitions of 'foo': bazel-external/y/WORKSPACE:2 repository(name = 'foo' version = '1')
63+
requested by bazel-external/x/WORKSPACE:2 repository(name = 'y')
64+
requested by WORKSPACE:3 repository(name = 'x')
65+
vs. bazel-external/a/WORKSPACE:2 repository(name = 'foo' version = '2')
66+
requested by WORKSPACE:2 repository(name = 'a')
67+
```
68+
69+
This is also the case with diamond dependencies:
70+
71+
<img src="/assets/ws-diamond.jpg" class="img-responsive">
72+
73+
This would print:
74+
75+
```
76+
ERROR: Conflicting definitions of 'foo': bazel-external/x/WORKSPACE:2 repository(name = 'foo' version = '2')
77+
requested by WORKSPACE:2 repository(name = 'x')
78+
vs. bazel-external/z/WORKSPACE:2 repository(name = 'foo' version = '1')
79+
requested by bazel-external/y/WORKSPACE:2 repository(name = 'z')
80+
requested by WORKSPACE:3 repository(name = 'y')
81+
```
82+
83+
## Upgrade path
84+
85+
I think that this should be fairly straightforward, as any repository used by
86+
the main repository or any subrepository had to be declared in the WORKSPACE
87+
already, so it will take precedence.
88+
89+
To be extra safe, we can start with adding a `recursive = False` attribute to
90+
the `workspace` rule, which we can then flip the default of.
91+
92+
## Implementation
93+
94+
There are two options for implementing this:
95+
96+
* We always download/link every external dependency before a build can happen.
97+
E.g., if @x isn't defined in the WORKSPACE file, we have to recursively
98+
traverse all of the repositories to know which repository does define it and
99+
if there are any conflicting definitions. This is correct, but will be
100+
frustrating to users and may not even work in some cases (e.g., if an
101+
OS-X-only skylark repository rule is fetched on Linux).
102+
* Every time a new WORKSPACE file is fetched, we check its repository rules
103+
against the ones already defined and look for version conflicts. This would
104+
entirely miss certain version conflicts until certain dependencies are built,
105+
but will have better performance.
106+
107+
I think users will rebel unless we go with Option 2. However, this can have
108+
some weird effects: suppose we have the diamond dependency above, and the user's
109+
BUILD file contains:
110+
111+
```
112+
cc_library(
113+
name = "bar",
114+
deps = ["@x//:dep"], # using @foo version 2
115+
)
116+
117+
cc_library(
118+
name = "baz",
119+
deps = ["@y//:dep"], # using @foo version 1
120+
)
121+
```
122+
123+
If they build :bar and their coworker builds :baz, the two builds will work and
124+
get different versions of @foo. However, as soon as one of them tries to build
125+
both, they'll get the version mismatch error.
126+
127+
This is suboptimal, but I can't think of a way that all three of these can be
128+
satisfied:
129+
130+
* The user doesn't have to declare everything at the top level.
131+
* Bazel doesn't have to load everything.
132+
* Bazel can immediately detect any conflicts.
133+
134+
This could be enforced by a CI on presubmit, which I think is good enough.
135+
136+
Whenever Bazel creates a new repository, it will attempt to parse the WORKSPACE
137+
file and do Skyframe lookups against each repository name. If the repository
138+
name is not defined, it will be initialized to the current WORKSPACE's
139+
definition. If it already exists, the existing value will be compared.
140+
141+
For now, we'll be very picky about equality: `maven_jar` and `new_http_archive`
142+
of the same Maven artifact will count as different repositories. For both
143+
native and skylark repository rules, they will have to be equal to not conflict.
144+
145+
One issue is that is a little tricky but I think will work out: the WORKSPACE
146+
file is parsed incrementally. Suppose the main WORKSPACE loads x.bzl, which
147+
declares @y and @z. If @y depends on @foo version 1 and @z depends on @foo
148+
version 2, this will throw a Skyframe error, even if @foo is later declared in
149+
the WORKSPACE file. However, this should be okay, because if these dependencies
150+
actually need @foo, it would need to be declared before them in the WS file
151+
already.
152+
153+
## Supplementary changes
154+
155+
Not strictly required, but as part of this I'm planning to implement:
156+
157+
* A `bazel-external` convenience symlink (to the `[output_base]/external`
158+
directory) so users can easily inspect their external repositories.
159+
* Add an option to generate all WORKSPACE definitions (so generate a flat
160+
WORKSPACE file from the hierarchy).
161+
162+
## Concerns
163+
164+
Questions users might have.
165+
166+
*Where did @x come from?*
167+
168+
Bazel will create a `bazel-external/@x.version` should contain the WORKSPACE (or
169+
.bzl file) where we got @x's def and other WORKSPACE files that contain it.
170+
171+
*Which version of @x is going to be chosen?*
172+
173+
See resolution section above. Perhaps people could query for //external:x?
174+
175+
*I want to use a different version of @x.*
176+
177+
Declare @x in your WORKSPACE file, it'll override "lower" rules.
178+
179+
*When I update @x, what else will change?*
180+
181+
Because @x might declare repo @y and @y's version might change as well, we'd
182+
need a different way to query for this. We could implement deps() for repo
183+
rules or have some other mechanism for this.
184+
185+
## Thoughts on future development
186+
187+
Moving towards the user-as-conflict-resolver model (vs. the
188+
user-as-transcriber-of-deps model) means that repositories that the user may not
189+
even be aware of might be available in their workspace. I think this kind of
190+
paves the way towards a nice auto-fetch system where a user could just depend on
191+
`@com_google_guava//whatever` in their BUILD file, and Bazel could figure out
192+
how to make `@com_google_guava` available.
193+
194+
## References
195+
196+
[So you want to write a package manager](https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527#.d90oxolzk)
197+
does a good job outlining many of these challenges, but their suggested approach
198+
(use semantic versioning for dependency resolution) cannot be used by Bazel for
199+
the general case.

0 commit comments

Comments
 (0)