|
| 1 | +--- |
| 2 | +feature: Decouple filesystem from categorization |
| 3 | +start-date: 2023-04-23 |
| 4 | +author: Anderson Torres (@AndersonTorres) |
| 5 | +co-authors: |
| 6 | +shepherd-team: @7c6f434c @natsukium @fgaz |
| 7 | +shepherd-leader: @7c6f434c |
| 8 | +related-issues: (will contain links to implementation PRs) |
| 9 | +--- |
| 10 | + |
| 11 | +# Summary |
| 12 | +[summary]: #summary |
| 13 | + |
| 14 | +Deploy a new method of categorization for the packages maintained by Nixpkgs, |
| 15 | +not relying on filesystem idiosyncrasies. |
| 16 | + |
| 17 | +# Motivation |
| 18 | +[motivation]: #motivation |
| 19 | + |
| 20 | +Currently, Nixpkgs uses the filesystem, or more accurately, the directory tree |
| 21 | +layout in order to informally categorize the softwares it packages, as described |
| 22 | +in the [Hierarchy](https://nixos.org/manual/nixpkgs/stable/#sec-hierarchy) |
| 23 | +section of Nixpkgs manual. |
| 24 | + |
| 25 | +This is a simple, easy to understand and consecrated-by-use method of |
| 26 | +categorization, partially employed by many other package managers like GNU Guix |
| 27 | +and NetBSD pkgsrc. |
| 28 | + |
| 29 | +However this system of categorization has serious problems: |
| 30 | + |
| 31 | +1. It is bounded by the constraints imposed by the filesystem. |
| 32 | + |
| 33 | + - Restrictions on filenames, subdirectory tree depth, permissions, inodes, |
| 34 | + quotas, and many other things. |
| 35 | + - Some of these restrictions are not well documented and are found simply |
| 36 | + by "bumping" on them. |
| 37 | + - The restrictions can vary on an implementation basis. |
| 38 | + - Some filesystems have more restrictions or less features than others, |
| 39 | + forcing an uncomfortable lowest common denominator. |
| 40 | + - Some operating systems can impose additional constraints over otherwise |
| 41 | + full-featured filesystems because of backwards compatibility (8 dot |
| 42 | + 3, anyone?). |
| 43 | + |
| 44 | +2. It requires a local checkout of the tree. |
| 45 | + |
| 46 | + Certainly this checkout can be "cached" using some form of `find . > |
| 47 | + /tmp/pkgs-listing.txt`, or more sophisticated solutions like `locate + |
| 48 | + updatedb`. Nonetheless such solutions still require access to a fresh, |
| 49 | + updated copy of the Nixpkgs tree. |
| 50 | + |
| 51 | +3. The creation of a new category - and more generally the manipulation of |
| 52 | + categories - requires an unpleaseant task of renaming and eventually patching |
| 53 | + many seemingly unrelated files. |
| 54 | + |
| 55 | + - Moving files around Nixpkgs codebase requires updating their forward and |
| 56 | + backward references. |
| 57 | + - Especially in some auxiliary tools like editor plugins, testing suites, |
| 58 | + autoupdate scripts and so on. |
| 59 | + - Rewriting `all-packages.nix` can be error-prone (even using Metapad) and it |
| 60 | + can generate huge, noisy patches. |
| 61 | + |
| 62 | +4. There is no convenient way to use multivalued categorization. |
| 63 | + |
| 64 | + A piece of software can fulfill many categories; e.g. |
| 65 | + - an educational game |
| 66 | + - a console emulator (vs. a PC emulator) |
| 67 | + - and a special-purpose programming language (say, a smart-contracts one). |
| 68 | + |
| 69 | + The current one-size-fits-all restriction is artificial, imposes unreasonable |
| 70 | + limitations and results in incomplete and confusing information. |
| 71 | + |
| 72 | + - No, symlinks or hardlinks are not convenient for this purpose; not all |
| 73 | + environments support them (falling on the "less features than others" |
| 74 | + problem expressed before) and they convey nothing besides confusion - just |
| 75 | + think about writing the corresponding entry in `all-packages.nix`. |
| 76 | + |
| 77 | +5. It puts over the (possibly human) package writer the mental load of where to |
| 78 | + put the files on the filesystem hierarchy, deviating them from the job of |
| 79 | + really writing them. |
| 80 | + |
| 81 | + - Or just taking the shortest path and throw it on a folder under `misc`. |
| 82 | + |
| 83 | +6. It "locks" the filesystem, preventing its usage for other, more sensible |
| 84 | + purposes. |
| 85 | + |
| 86 | +7. The most important: the categorization is not discoverable via Nix language |
| 87 | + infrastructure. |
| 88 | + |
| 89 | + Indeed there is no higher level way to query about such categories besides |
| 90 | + the one described in the bullet 2 above. |
| 91 | + |
| 92 | +In light of such a bunch of problems, this RFC proposes a novel alternative to |
| 93 | +the above mess: new `meta` attributes. |
| 94 | + |
| 95 | +# Detailed design |
| 96 | +[design]: #detailed-design |
| 97 | + |
| 98 | +## Code Implementation |
| 99 | +[code-implementation]: #code-implementation |
| 100 | + |
| 101 | +A new attribute, `meta.categories`, will be included for every Nix expression |
| 102 | +living inside Nixpkgs. |
| 103 | + |
| 104 | +This attribute will be a list, whose elements are one of the possible elements |
| 105 | +of the `lib.categories` set. |
| 106 | + |
| 107 | +A typical snippet of `lib.categories` will be similar to: |
| 108 | + |
| 109 | +```nix |
| 110 | +{ |
| 111 | + assembler = { |
| 112 | + name = "Assembler"; |
| 113 | + description = '' |
| 114 | + A program that converts text written in assembly language to binary code. |
| 115 | + ''; |
| 116 | + }; |
| 117 | +
|
| 118 | + compiler = { |
| 119 | + name = "Compiler"; |
| 120 | + description = '' |
| 121 | + A program that converts a source from a language to another, usually from |
| 122 | + a higher, human-readable level to a lower, machine level. |
| 123 | + ''; |
| 124 | + }; |
| 125 | +
|
| 126 | + font = { |
| 127 | + name = "Font"; |
| 128 | + description = '' |
| 129 | + A set of files that defines a set of graphically-related glyphs. |
| 130 | + ''; |
| 131 | + }; |
| 132 | +
|
| 133 | + game = { |
| 134 | + name = "Game"; |
| 135 | + description = '' |
| 136 | + A program developed with entertainment in mind. |
| 137 | + ''; |
| 138 | + }; |
| 139 | +
|
| 140 | + interpreter = { |
| 141 | + name = "Interpreter"; |
| 142 | + description = '' |
| 143 | + A program that directly executes instructions written in a programming |
| 144 | + language, without requiring compilation into the native machine language. |
| 145 | + ''; |
| 146 | + }; |
| 147 | +
|
| 148 | +``` |
| 149 | + |
| 150 | +### Semantic Details |
| 151 | +[semantic-details]: #semantic-details |
| 152 | + |
| 153 | +Given that `meta.categories` is implemented as a list, it is interesting to |
| 154 | +treat the first element of this list as the "most important" categorization, the |
| 155 | +one that mostly identifies with the software being classified. |
| 156 | + |
| 157 | +## Categorization Team |
| 158 | +[categorization-team]: #categorization-team |
| 159 | + |
| 160 | +Given the typical complexities that arise from categorization, and expecting |
| 161 | +that regular maintainers are not expected to understand its minuteness |
| 162 | +(according to the experience from [Debtags |
| 163 | +Team](https://wiki.debian.org/Debtags/FAQ#Why_don.27t_you_just_ask_the_maintainers_to_tag_their_own_packages.3F)), |
| 164 | +it is strongly recommended the creation of a team entrusted with authority to |
| 165 | +manage issues related to categorization and carry their corresponding duties. |
| 166 | + |
| 167 | +# Examples and Interactions |
| 168 | +[examples-and-interactions]: #examples-and-interactions |
| 169 | + |
| 170 | +In file bochs/default.nix: |
| 171 | + |
| 172 | +```nix |
| 173 | +stdenv.mkDerivation { |
| 174 | +
|
| 175 | +. . . |
| 176 | +
|
| 177 | + meta = { |
| 178 | + . . . |
| 179 | + categories = with lib.categories; [ emulator debugger ]; |
| 180 | + . . . |
| 181 | + }; |
| 182 | + }; |
| 183 | +} |
| 184 | +
|
| 185 | +``` |
| 186 | + |
| 187 | +In a `nix repl`: |
| 188 | + |
| 189 | +``` |
| 190 | +nix-repl> :l <nixpkgs> |
| 191 | +Added XXXXXX variables. |
| 192 | +
|
| 193 | +nix-repl> pkgs.bochs.meta.categories |
| 194 | +[ { ... } ] |
| 195 | +
|
| 196 | +nix-repl> map (z: z.name) pkgs.bochs.meta.categories |
| 197 | +[ "debugger" "emulator" ] |
| 198 | +``` |
| 199 | + |
| 200 | +# Drawbacks |
| 201 | +[drawbacks]: #drawbacks |
| 202 | + |
| 203 | +The most immediate drawbacks are: |
| 204 | + |
| 205 | +1. A huge treewide edit of Nixpkgs |
| 206 | + |
| 207 | + On the other hand, this is easily sprintable and amenable to automation. |
| 208 | + |
| 209 | +2. Bikeshedding |
| 210 | + |
| 211 | + How many and which categories we should create? Can we expand them later? |
| 212 | + |
| 213 | + For start, we can follow/take inspiration from many of the already existing |
| 214 | + categories sets and add extra ones when the needs arise. Indeed, it is way |
| 215 | + easier to create such categories using Nix language when compared to other |
| 216 | + software collections. |
| 217 | + |
| 218 | + Further, the creation of a categorization team can resolve those litigations. |
| 219 | + |
| 220 | +3. Superfluous |
| 221 | + |
| 222 | + It can be argued that there are other ways to discover similar or related |
| 223 | + package sets, like Repology. |
| 224 | + |
| 225 | + However, this argument is a bit circular, because e.g. the classification |
| 226 | + shown by Repology effectively replicates the classification done by the many |
| 227 | + software collections in its catalog. Therefore, relying in Repology merely |
| 228 | + transfers the question to external sources. |
| 229 | + |
| 230 | + Further it becomes more pronounced when we take into account the fact Nixpkgs |
| 231 | + is top 1 of most Repology statistics. The expected outcome, therefore, should |
| 232 | + be precisely the opposite: Nixpkgs being _the_ source of structured metainfo |
| 233 | + for other software collections. |
| 234 | + |
| 235 | +# Alternatives |
| 236 | +[alternatives]: #alternatives |
| 237 | + |
| 238 | +1. Do nothing |
| 239 | + |
| 240 | + This will exacerbate the problems already listed. |
| 241 | + |
| 242 | +2. Ignore/nuke the categorization completely |
| 243 | + |
| 244 | + This is an alternative worthy of some consideration. After all, |
| 245 | + categorization is not without its problems, as shown above. Removing or |
| 246 | + ignoring classification removes all problems. |
| 247 | + |
| 248 | + However, there are good reasons to keep the categorization: |
| 249 | + |
| 250 | + - The complete removal of categorization is too harsh. A solution that keeps |
| 251 | + and enhances the categorization is way more preferrable than one that nukes |
| 252 | + it completely. |
| 253 | + |
| 254 | + - As said before, the categorization is already present; this RFC proposes to |
| 255 | + expose it to a higher level, in a structured, more discoverable format. |
| 256 | + |
| 257 | + - Categorization is very traditional among software collections. Many of them |
| 258 | + are doing this just fine for years on end, and Nixpkgs can imitate them |
| 259 | + easily - and even surpass them, given the benefits of Nix language |
| 260 | + machinery. |
| 261 | + |
| 262 | + - Categorization is useful in many scenarios and use cases - indeed they |
| 263 | + are ubiquitous in software world: |
| 264 | + - specialized search engines (from Repology to MELPA) |
| 265 | + - code forges, from Sourceforge to Gitlab |
| 266 | + - as said above, software collections from pkgsrc to slackbuilds |
| 267 | + - organization and preservation (as Software Heritage) |
| 268 | + |
| 269 | +3. Debtags/Appstream hybrid approach |
| 270 | + |
| 271 | +A hybrid approach for code implementation would be implement two meta |
| 272 | +attributes, namely |
| 273 | + |
| 274 | +- `meta.categories` for Appstream-based categories |
| 275 | + - the corresponding `lib.categories` should follow Appstream closely, with |
| 276 | + few room to custom/extra categories |
| 277 | +- `meta.tags` for Debtags-like tags |
| 278 | + - while being inspired from the venerable Debtags work, the corresponding |
| 279 | + `lib.tags` is completely free to modify and even divert from Debtags, |
| 280 | + following its own way |
| 281 | +- generally speaking, `lib.tags` should be less bureaucratic than |
| 282 | + `lib.categories` |
| 283 | + |
| 284 | +However, this approach arguably elevates the complexity of the whole work, and |
| 285 | +adds too much redundancy. |
| 286 | + |
| 287 | +# Prior art |
| 288 | +[prior-art]: #prior-art |
| 289 | + |
| 290 | +As said above, categorization is very traditional among software collections. It |
| 291 | +is not hard to cite examples in this arena; the most interesting ones I have |
| 292 | +found are listed below (linked at [references section](#references)): |
| 293 | + |
| 294 | +- FreeBSD Ports; |
| 295 | +- Debtags; |
| 296 | +- Appstream Project; |
| 297 | + |
| 298 | +# Unresolved questions |
| 299 | +[unresolved]: #unresolved-questions |
| 300 | + |
| 301 | +There are remaining issues to be solved by the categorization team: |
| 302 | + |
| 303 | +- What data structure is suitable to represent a category? |
| 304 | + - For now we stick to the most natural: a set `{ name, description }`. |
| 305 | + |
| 306 | +- Should we have a set of primary, "most important" categories with mandatory |
| 307 | + status, in the sense each package should set at least one of them? |
| 308 | + - The answer is most certainly positive. |
| 309 | + |
| 310 | +# Future work |
| 311 | +[future]: #future-work |
| 312 | + |
| 313 | +- Create the [categorization team](#categorization-team) |
| 314 | +- Carry out the duties correlated to categorization, including but not limited |
| 315 | + to: |
| 316 | + |
| 317 | + - Decide between possibilities of implementation; |
| 318 | + - Documentation updates; |
| 319 | + - Category curation, integration and updates; |
| 320 | + - Continuous Integration updates and adaptations; |
| 321 | + - Coordinaton of efforts to import, integrate and update categorization of |
| 322 | + packages; |
| 323 | + - Litigations and disputations: |
| 324 | + - Solve them, especially in corner cases; |
| 325 | + - Enforce implementation issues |
| 326 | + - Decide when a CI check should be converted to block |
| 327 | + - Grace periods |
| 328 | + |
| 329 | +# References |
| 330 | +[references]: #references |
| 331 | + |
| 332 | +- [Desktop Menu |
| 333 | + Specification](https://specifications.freedesktop.org/menu-spec/latest/); |
| 334 | + specifically, |
| 335 | + - [Main |
| 336 | + categories](https://specifications.freedesktop.org/menu-spec/latest/apa.html) |
| 337 | + - [Additional |
| 338 | + categories](https://specifications.freedesktop.org/menu-spec/latest/apas02.html) |
| 339 | + - [Reserved |
| 340 | + categories](https://specifications.freedesktop.org/menu-spec/latest/apas03.html) |
| 341 | + |
| 342 | +- [Appstream](https://www.freedesktop.org/wiki/Distributions/AppStream/) |
| 343 | + |
| 344 | +- [Debtags](https://wiki.debian.org/Debtags) |
| 345 | + |
| 346 | + - [Debtags FAQ](https://wiki.debian.org/Debtags/FAQ) |
| 347 | + |
| 348 | +- [NetBSD pkgsrc guide](https://www.netbsd.org/docs/pkgsrc/) |
| 349 | + - Especially, [Chapter 12, Section |
| 350 | + 1](https://www.netbsd.org/docs/pkgsrc/components.html#components.Makefile) |
| 351 | + contains a short list of CATEGORIES. |
| 352 | + |
| 353 | +- [FreeBSD Porters |
| 354 | + Handbook](https://docs.freebsd.org/en/books/porters-handbook/) |
| 355 | + - Especially |
| 356 | + [Categories](https://docs.freebsd.org/en/books/porters-handbook/makefiles/#porting-categories) |
0 commit comments