|
| 1 | +--- |
| 2 | +layout: contribute |
| 3 | +title: Recursive WORKSPACE file parsing |
| 4 | +--- |
| 5 | +# Recursive WORKSPACE file parsing |
| 6 | + |
| 7 | +**Status**: Unimplemented |
| 8 | + |
| 9 | +**Author **: [kchodorow@ ](mailto:[email protected]) |
| 10 | + |
| 11 | +## Objective |
| 12 | + |
| 13 | +Users are annoyed by having to specify all deps in their WORKSPACE file. To |
| 14 | +avoid this inconvenience, Bazel could load subprojects' WORKSPACE files |
| 15 | +automatically. |
| 16 | + |
| 17 | +## Non-Goals |
| 18 | + |
| 19 | +* Solve the problem of people specifying two different repositories with the |
| 20 | + same name (e.g., @util). |
| 21 | +* Solve the problem of people specifying two different names for the same |
| 22 | + repository (`@guava` and `@com_google_guava`). |
| 23 | + |
| 24 | +## Resolution |
| 25 | + |
| 26 | +When a repository is defined in multiple files, which definition "wins"? What |
| 27 | +causes a conflict/error? |
| 28 | + |
| 29 | +### Defined in the main repository's WORKSPACE file |
| 30 | + |
| 31 | +This definition wins, regardless of other definitions. |
| 32 | + |
| 33 | +### In a line |
| 34 | + |
| 35 | +Note: as an intermediate step, this can be disabled, but the end goal is to |
| 36 | +allow this so that intermediate dependencies that the top level doesn't care |
| 37 | +about don't need to be resolved. |
| 38 | + |
| 39 | +Suppose we have a main repository that depends on repo x, and x depends on repo |
| 40 | +y: |
| 41 | + |
| 42 | +<img src="/assets/ws-line.jpg" class="img-responsive"> |
| 43 | + |
| 44 | +In this case, version 1 of "foo" wins. This way, if a library has already |
| 45 | +figured out which version works for them, its reverse dependencies do not have |
| 46 | +to think about it. |
| 47 | + |
| 48 | +This will also work if a parent overrides its children's versions, even if it |
| 49 | +has multiple children. |
| 50 | + |
| 51 | +### Different lines |
| 52 | + |
| 53 | +If there is no obvious hierarchy and multiple versions are specified, error out. |
| 54 | +Report what each chain of dependencies was that wanted the dep and at which |
| 55 | +versions: |
| 56 | + |
| 57 | +<img src="/assets/ws-multiline.jpg" class="img-responsive"> |
| 58 | + |
| 59 | +In this case, Bazel would error out with: |
| 60 | + |
| 61 | +``` |
| 62 | +ERROR: Conflicting definitions of 'foo': bazel-external/y/WORKSPACE:2 repository(name = 'foo' version = '1') |
| 63 | + requested by bazel-external/x/WORKSPACE:2 repository(name = 'y') |
| 64 | + requested by WORKSPACE:3 repository(name = 'x') |
| 65 | +vs. bazel-external/a/WORKSPACE:2 repository(name = 'foo' version = '2') |
| 66 | + requested by WORKSPACE:2 repository(name = 'a') |
| 67 | +``` |
| 68 | + |
| 69 | +This is also the case with diamond dependencies: |
| 70 | + |
| 71 | +<img src="/assets/ws-diamond.jpg" class="img-responsive"> |
| 72 | + |
| 73 | +This would print: |
| 74 | + |
| 75 | +``` |
| 76 | +ERROR: Conflicting definitions of 'foo': bazel-external/x/WORKSPACE:2 repository(name = 'foo' version = '2') |
| 77 | + requested by WORKSPACE:2 repository(name = 'x') |
| 78 | +vs. bazel-external/z/WORKSPACE:2 repository(name = 'foo' version = '1') |
| 79 | + requested by bazel-external/y/WORKSPACE:2 repository(name = 'z') |
| 80 | + requested by WORKSPACE:3 repository(name = 'y') |
| 81 | +``` |
| 82 | + |
| 83 | +## Upgrade path |
| 84 | + |
| 85 | +I think that this should be fairly straightforward, as any repository used by |
| 86 | +the main repository or any subrepository had to be declared in the WORKSPACE |
| 87 | +already, so it will take precedence. |
| 88 | + |
| 89 | +To be extra safe, we can start with adding a `recursive = False` attribute to |
| 90 | +the `workspace` rule, which we can then flip the default of. |
| 91 | + |
| 92 | +## Implementation |
| 93 | + |
| 94 | +There are two options for implementing this: |
| 95 | + |
| 96 | +* We always download/link every external dependency before a build can happen. |
| 97 | + E.g., if @x isn't defined in the WORKSPACE file, we have to recursively |
| 98 | + traverse all of the repositories to know which repository does define it and |
| 99 | + if there are any conflicting definitions. This is correct, but will be |
| 100 | + frustrating to users and may not even work in some cases (e.g., if an |
| 101 | + OS-X-only skylark repository rule is fetched on Linux). |
| 102 | +* Every time a new WORKSPACE file is fetched, we check its repository rules |
| 103 | + against the ones already defined and look for version conflicts. This would |
| 104 | + entirely miss certain version conflicts until certain dependencies are built, |
| 105 | + but will have better performance. |
| 106 | + |
| 107 | +I think users will rebel unless we go with Option 2. However, this can have |
| 108 | +some weird effects: suppose we have the diamond dependency above, and the user's |
| 109 | +BUILD file contains: |
| 110 | + |
| 111 | +``` |
| 112 | +cc_library( |
| 113 | + name = "bar", |
| 114 | + deps = ["@x//:dep"], # using @foo version 2 |
| 115 | +) |
| 116 | +
|
| 117 | +cc_library( |
| 118 | + name = "baz", |
| 119 | + deps = ["@y//:dep"], # using @foo version 1 |
| 120 | +) |
| 121 | +``` |
| 122 | + |
| 123 | +If they build :bar and their coworker builds :baz, the two builds will work and |
| 124 | +get different versions of @foo. However, as soon as one of them tries to build |
| 125 | +both, they'll get the version mismatch error. |
| 126 | + |
| 127 | +This is suboptimal, but I can't think of a way that all three of these can be |
| 128 | +satisfied: |
| 129 | + |
| 130 | +* The user doesn't have to declare everything at the top level. |
| 131 | +* Bazel doesn't have to load everything. |
| 132 | +* Bazel can immediately detect any conflicts. |
| 133 | + |
| 134 | +This could be enforced by a CI on presubmit, which I think is good enough. |
| 135 | + |
| 136 | +Whenever Bazel creates a new repository, it will attempt to parse the WORKSPACE |
| 137 | +file and do Skyframe lookups against each repository name. If the repository |
| 138 | +name is not defined, it will be initialized to the current WORKSPACE's |
| 139 | +definition. If it already exists, the existing value will be compared. |
| 140 | + |
| 141 | +For now, we'll be very picky about equality: `maven_jar` and `new_http_archive` |
| 142 | +of the same Maven artifact will count as different repositories. For both |
| 143 | +native and skylark repository rules, they will have to be equal to not conflict. |
| 144 | + |
| 145 | +One issue is that is a little tricky but I think will work out: the WORKSPACE |
| 146 | +file is parsed incrementally. Suppose the main WORKSPACE loads x.bzl, which |
| 147 | +declares @y and @z. If @y depends on @foo version 1 and @z depends on @foo |
| 148 | +version 2, this will throw a Skyframe error, even if @foo is later declared in |
| 149 | +the WORKSPACE file. However, this should be okay, because if these dependencies |
| 150 | +actually need @foo, it would need to be declared before them in the WS file |
| 151 | +already. |
| 152 | + |
| 153 | +## Supplementary changes |
| 154 | + |
| 155 | +Not strictly required, but as part of this I'm planning to implement: |
| 156 | + |
| 157 | +* A `bazel-external` convenience symlink (to the `[output_base]/external` |
| 158 | + directory) so users can easily inspect their external repositories. |
| 159 | +* Add an option to generate all WORKSPACE definitions (so generate a flat |
| 160 | + WORKSPACE file from the hierarchy). |
| 161 | + |
| 162 | +## Concerns |
| 163 | + |
| 164 | +Questions users might have. |
| 165 | + |
| 166 | +*Where did @x come from?* |
| 167 | + |
| 168 | +Bazel will create a `bazel-external/@x.version` should contain the WORKSPACE (or |
| 169 | +.bzl file) where we got @x's def and other WORKSPACE files that contain it. |
| 170 | + |
| 171 | +*Which version of @x is going to be chosen?* |
| 172 | + |
| 173 | +See resolution section above. Perhaps people could query for //external:x? |
| 174 | + |
| 175 | +*I want to use a different version of @x.* |
| 176 | + |
| 177 | +Declare @x in your WORKSPACE file, it'll override "lower" rules. |
| 178 | + |
| 179 | +*When I update @x, what else will change?* |
| 180 | + |
| 181 | +Because @x might declare repo @y and @y's version might change as well, we'd |
| 182 | +need a different way to query for this. We could implement deps() for repo |
| 183 | +rules or have some other mechanism for this. |
| 184 | + |
| 185 | +## Thoughts on future development |
| 186 | + |
| 187 | +Moving towards the user-as-conflict-resolver model (vs. the |
| 188 | +user-as-transcriber-of-deps model) means that repositories that the user may not |
| 189 | +even be aware of might be available in their workspace. I think this kind of |
| 190 | +paves the way towards a nice auto-fetch system where a user could just depend on |
| 191 | +`@com_google_guava//whatever` in their BUILD file, and Bazel could figure out |
| 192 | +how to make `@com_google_guava` available. |
| 193 | + |
| 194 | +## References |
| 195 | + |
| 196 | +[So you want to write a package manager](https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527#.d90oxolzk) |
| 197 | +does a good job outlining many of these challenges, but their suggested approach |
| 198 | +(use semantic versioning for dependency resolution) cannot be used by Bazel for |
| 199 | +the general case. |
0 commit comments