-
Notifications
You must be signed in to change notification settings - Fork 73
Modules support #1293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This includes |
(Note: speaking as a CMake developer here with a focus on how modules work with CMake; other build systems have similar challenges, but not necessarily the same ones)
Note that I suspect that for static analysis, a lot more communication between build systems and tooling to know what is going on. The problem is that with explicit modules even
This isn't even counting the fact that the information presented there isn't directly usable as any referenced BMI files will be in the host compiler, not something the tool necessarily understands. |
In the age we live in, a programming language is only as good as it runtimes/frameworks and lastly editor support... I mean, who codes these days with just syntax highlighting? C++ modules are - from a user perspective - a very nice addition to the C++ syntax end really modernizes it. However, it is really being held back by editor support. AFAIK: C++ modules are only supported by VisualStudio (using msvc ixx files) and CLion (using clang cppm files). If clangd would support it, it will significantly improve editor support and with that its usability/popularity. As for compilers having there own flags/options/file-extensions for modules - which is regretful - just ignore all that and start with supporting the clangs own formats. Although this is - currently - not a critical functionality, I personally am hoping to see modules supported by clangd soon! |
from https://gitlab.kitware.com/cmake/cmake/-/issues/18355 |
It may be somewhat instructive to look at CLion's approach to supporting modules, which first appeared in the latest 2022.3 release: https://blog.jetbrains.com/clion/2022/11/clion-2022-3-released/#cpp_modules It's somewhat light on details but this part seems relevant:
|
This is not strictly correct since it is allowed to declare a module within a
Yeah. Although it didn't mention a lot technical details, I guess we can't learn a lot from this blog. And it only mentioned about the second requirement given by @sam-mccall about the implicit buildings. I guess it may be OK to be too good as the first step. I mean, maybe is it acceptable to ignore some requirements in the first step. Including:
And what is good as the first step? I think it may be good if:
How do you guys think about this? @HighCommander4 |
Please don't use extensions to decide which files are modules. It is perfectly valid to have a .cpp file be a module interface. Please just use a scanner. A lot of work went in to the standard wording to ensure that it is possible to very efficiently scan the whole codebase. I know some effort was started in clang-scan-deps for this. It would be great it clangd could reuse and possibly improve this code when doing its scan. |
Yeah, it is better and more clear. I've edited the post. |
After taking some times to look at the code, I have some lower-level ideas: It looks possible to support C++20 modules in
In my mind, it looks like it is workable. The version is not locked and it is possible to work with GCC and MSVC. Also there are some problems in the compiler side. e.g., the size of BMI generated by clang is too big now. Maybe this may break the cache. But I feel it OK as the first step. Also There are some higher-level issues: |
My perspective as a user is that yes, this level of support for modules would already be useful and valuable. I will defer to @sam-mccall for a maintainer's perspective of what would be accepted as a contribution. (I would imagine that it's fine to have various limitations especially if the support is behind an option/flag.) A few thoughts/questions:
|
Yes, when the compilation database is loaded, it calls this function with all of the entries in the database, which will call |
I am not pretty sure that I've understood the idea of the indexer. My current imagination is to let the TUScheduler to hold the ModulesDependencyGraph. And when Now it is OK roughly for the a-b test case in llvm/llvm-project#58723. I feel like the there is only one TUScheduler in one project (is this true?) So the BMIs won't be rebuilt every time we opened a new file. But I feel I missed some points since I didn't touch the indexer at all. I initially want to work in BackgroundIndex but the log shows it is not reached after I open a new project even if I add |
I believe so.
Note, I was also thinking of persisting the BMIs between clangd invocations.
The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter |
It makes sense. I feel like it is OK
My thought is that it works as long as we rewrote the command line to refer to the new BMIs we built. |
I think that would work as part of the solution. Another part of the solution would be to ensure that the indexing of source files only starts after the BMIs have been built. Presumably, something should proactively drive the building of BMIs (not just when you open a source file that depends on it), so that the indexing of source files eventually gets unblocked. Maybe it makes sense to think of building the BMIs as the "first stage" of indexing, and indexing the source files as the "second stage"? |
Do you mean only indexing the source files after we built all the BMIs or after we finish the first stage (since it is possible that some BMI is not buildable)? Sounds like a good idea. It can ease the implementation. |
A few random thoughts... (Full disclosure: I did some brainstorming about this a few months ago and couldn't come up with anything good, hopefully y'all are more creative & understand modules better!)
|
(note: I know nothing about clangd's internals so this comment is based on guesses based on observable behavior. Hopefully it isn't complete nonsense and if needed someone can translate into the correct terminology) I assume clangd must already have an implementation of a proto-buildsystem internally. For example, it can schedule indexing tasks with a controlled amount of parallelism and trigger reindexing of downstream TUs when a header-file changes. Ideally it also notices when files change outside the editor (eg git) or when the editor is closed and is able to do incremental reindexing where needed (although I've experienced issues around this in the past, less so recently). I think part of the solution is to double-down on having an integrated build system. Either continue improving the integrated one, or switch to something like llbuild. The key thing that I don't think was required before was task dependencies. While you did have source/header file -> TU indexer job dependencies, they merely triggered rerunning the task. Now you will need some indexer tasks to have an input dependency on the BMI which will be generated by another task. And it is important to rerun the downstream indexers whenever an upstream BMI is meaningfully changed (defining "meaningfully" can be tricky; a first pass can just treat any content difference as meaningful, but I suspect that can be improved on). Additionally you cannot begin doing (much of) anything prior to scanning to determine module names and import requirements. And of course, when a file changes, it must be rescanned because it may have added an import, or changed which module/partition it declares itself to be part of. Files can also go from not generating a BMI to generating one eg if you add a I don't know how easy or hard it would be to fit the concept of task dependencies into the existing proto-buildsystem, but at least from an outsider's perspective, that seems like the best (of not only) option. Or maybe I've just spent too much time working on buildsystems, so they feel like the solution to every problem now... 😄 |
(sorry, hit the wrong button)
The "lite mode" ASTs (skip-function-bodies etc) may not be suitable for indexing, and vice versa (even opportunistically: full ASTs are much larger => slower to load). I believe we can't mix the two due to diamond dependency problems. They may also have conflicting requirements for how to deal with staleness. If these can be solved, it would be fantastic.
A very simple one for background indexing (no dependencies, as you note). However very deliberately no core functionality depends on this: the index is entirely optional and all features work (though degraded) if it's stale. If we're going to have a build system we block on (and modules ~requires it) then it needs to be something new. It's going to be responsible for the critical path to user-facing latency, as opposed to background index's "process an infinite amount of non-critical batch work". |
Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. And this patch reflects the above opinions.
Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. And this patch reflects the above opinions.
#66462) Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).
#66462) Summary: Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250983
Any updates on this? This is one of the last things making me hesitate to adopt modules in my projects as I make heavy use of clangd in my workflow. |
We have experimental support now. You can try to enable it by |
I tried this yesterday and immediately ran into the issue noted in #1105 (note: no assertion, but the reported error about mismatching compiler flag). |
Are you meeting the same error? Where is that flag coming from? |
According to the linked issue, So it looks as if perhaps whatever was triggering the assert in that issue got addressed at some stage, but the linked patch for the mismatch fix (which I suppose was just saying "a mismatch in this flag need not prevent module load") never got merged? |
Oh, just noticed that this was the issue 2 years ago. I will do it. BTW, in such cases, maybe you can workaround it by adding |
Good to know, thanks! |
…ptions Motivated by comments in clangd/clangd#1293 And RetainCommentsFromSystemHeaders shouldn't affect the compatibleness anyway.
FYI, done in llvm/llvm-project@d5cdc26 |
…ptions Motivated by comments in clangd/clangd#1293 And RetainCommentsFromSystemHeaders shouldn't affect the compatibleness anyway.
Seems clangd now only works with the module build of clang, and does not support gcc.. Emm |
…ptions Motivated by comments in clangd/clangd#1293 And RetainCommentsFromSystemHeaders shouldn't affect the compatibleness anyway.
Yes, this makes sense to me as clang cannot read gcc's module files. |
Somehow triggered
Found this in the log, and after that all LSP requests were responded with "invalid AST" and code -32001 |
Oh it turned out that it's because I'm too dumb and forgot to update Speaking of "-fretain-comments-from-system-headers", it seems that neither |
This might be fixed in llvm/llvm-project@d5cdc26 |
Yeah it was fixed but what makes me struggle is that the option |
It would be nice if some of {C++20 modules, clang header modules} worked in clangd.
Essentially no configurations of this work today: occasionally things may happen to work but there are many reported problems and crashes. These are not just bugs that can be fixed, a design and new infrastructure is needed.
Because clang's module functionality can be enabled by setting driver flags, it is possible/easy for people to end up in this broken and unsupportable state, and wasting (their + our) time debugging it. We should consider failing early and explicitly instead.
AFAIK nobody has plans/availability to work on this soon.
Some issues that need to be addressed:
The text was updated successfully, but these errors were encountered: