Skip to content

Modules support #1293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sam-mccall opened this issue Sep 15, 2022 · 42 comments
Open

Modules support #1293

sam-mccall opened this issue Sep 15, 2022 · 42 comments
Labels
enhancement New feature or request

Comments

@sam-mccall
Copy link
Member

sam-mccall commented Sep 15, 2022

It would be nice if some of {C++20 modules, clang header modules} worked in clangd.

Essentially no configurations of this work today: occasionally things may happen to work but there are many reported problems and crashes. These are not just bugs that can be fixed, a design and new infrastructure is needed.

Because clang's module functionality can be enabled by setting driver flags, it is possible/easy for people to end up in this broken and unsupportable state, and wasting (their + our) time debugging it. We should consider failing early and explicitly instead.

AFAIK nobody has plans/availability to work on this soon.

Some issues that need to be addressed:

  • which subset of clang's many module features/flags are in scope?
  • explicit modules provided by build system: we can't assume these are compatible (may be a different compiler version). So may need to convert to implicitly building modules needed.
  • having AST workers/background threads spawning module builds independently and coordinating through the module cache is probably not good, we'll want some kind of worker pool and coordination
  • we presumably want to parse modules in the same lite mode as preambles (e.g. skip-function-bodies)
  • gcc modules flags are different than clang modules flags (and MSVC presumably different again), we need to parse them
  • how do modules fit into the dynamic index? what does a preamble with modules in it look like?
  • scattered places where we assume mainfile/preamble dichotomy, reason about #include structure, etc
  • actual handling of import/export AST nodes
@HighCommander4
Copy link
Contributor

Some other open bugs we have on file related to modules are #738, #797, #1105.

@Trass3r
Copy link

Trass3r commented Nov 19, 2022

This includes -fmodules? What is the current state?
With an llvm LLVM_ENABLE_LIBCXX=ON LLVM_ENABLE_MODULES=ON workspace clangd doesn't crash or anything but produces lots of squiggles while the build itself runs through just fine.
Shouldn't clangd rather ignore those arguments if it isn't able to handle them properly? (-fmodules -fmodules-cache-path=... -Xclang -fmodules-local-submodule-visibility -gmodules)

@mathstuf
Copy link

mathstuf commented Dec 1, 2022

(Note: speaking as a CMake developer here with a focus on how modules work with CMake; other build systems have similar challenges, but not necessarily the same ones)

gcc modules flags are different than clang modules flags (and MSVC presumably different again), we need to parse them

Note that gcc's module flags may indicate an active participant if -fmodule-mapper= points to a socket or process execution. That may not be emulatable (I believe build2 does this, but CMake uses static files written out during the build).

I suspect that for static analysis, a lot more communication between build systems and tooling to know what is going on. The problem is that with explicit modules even compile_commands.json is not useful until a build happens because:

  • for gcc, the -fmodule-mapper= argument is generated at build
  • for clang and msvc, modules are communicated via @rspfile arguments that don't exist until the build is performed

This isn't even counting the fact that the information presented there isn't directly usable as any referenced BMI files will be in the host compiler, not something the tool necessarily understands.

@MK-Alias
Copy link

MK-Alias commented Dec 2, 2022

In the age we live in, a programming language is only as good as it runtimes/frameworks and lastly editor support... I mean, who codes these days with just syntax highlighting? C++ modules are - from a user perspective - a very nice addition to the C++ syntax end really modernizes it. However, it is really being held back by editor support.

AFAIK: C++ modules are only supported by VisualStudio (using msvc ixx files) and CLion (using clang cppm files). If clangd would support it, it will significantly improve editor support and with that its usability/popularity.

As for compilers having there own flags/options/file-extensions for modules - which is regretful - just ignore all that and start with supporting the clangs own formats.

Although this is - currently - not a critical functionality, I personally am hoping to see modules supported by clangd soon!

@sdykae
Copy link

sdykae commented Dec 29, 2022

from https://gitlab.kitware.com/cmake/cmake/-/issues/18355
image
https://github.com/mathstuf/llvm-project/tree/p1689r5-csd
Waiting for stable llvm pr, and support to clangd to start development with vscode extension.

@HighCommander4
Copy link
Contributor

It may be somewhat instructive to look at CLion's approach to supporting modules, which first appeared in the latest 2022.3 release: https://blog.jetbrains.com/clion/2022/11/clion-2022-3-released/#cpp_modules

It's somewhat light on details but this part seems relevant:

CLion collects module information from all .ixx, .cppm, and .mxx files in the project and doesn’t rely on the specific approach you use to build modules. Whether it’s CMake with the Visual Studio C++ toolchain or CMake with Clang and specific compilation flags, CLion provides a similar experience

@ChuanqiXu9
Copy link

ChuanqiXu9 commented May 31, 2023

CLion collects module information from all .ixx, .cppm, and .mxx files in the project and doesn’t rely on the specific approach you use to build modules.

This is not strictly correct since it is allowed to declare a module within a .cpp files. But it should be acceptable as the first step.

It may be somewhat instructive to look at CLion's approach to supporting modules, which first appeared in the latest 2022.3 release: https://blog.jetbrains.com/clion/2022/11/clion-2022-3-released/#cpp_modules

Yeah. Although it didn't mention a lot technical details, I guess we can't learn a lot from this blog. And it only mentioned about the second requirement given by @sam-mccall about the implicit buildings.


I guess it may be OK to be too good as the first step. I mean, maybe is it acceptable to ignore some requirements in the first step. Including:

  • Compatibility about GCC and MSVC.
  • Indexing things.
  • lite mode things (e.g. skip-function-bodies)
  • #include structures

And what is good as the first step? I think it may be good if:

  • We can collect the potential module files by clang-scan-deps in the project tree.
  • We can build these module files implicitly in order.
  • We can get the AST we just built.

How do you guys think about this? @HighCommander4

@RedBeard0531
Copy link

We can collect the potential module files (by its suffixes) in the project tree.

Please don't use extensions to decide which files are modules. It is perfectly valid to have a .cpp file be a module interface. Please just use a scanner. A lot of work went in to the standard wording to ensure that it is possible to very efficiently scan the whole codebase. I know some effort was started in clang-scan-deps for this. It would be great it clangd could reuse and possibly improve this code when doing its scan.

@ChuanqiXu9
Copy link

We can collect the potential module files (by its suffixes) in the project tree.

Please don't use extensions to decide which files are modules. It is perfectly valid to have a .cpp file be a module interface. Please just use a scanner. A lot of work went in to the standard wording to ensure that it is possible to very efficiently scan the whole codebase. I know some effort was started in clang-scan-deps for this. It would be great it clangd could reuse and possibly improve this code when doing its scan.

Yeah, it is better and more clear. I've edited the post.

@ChuanqiXu9
Copy link

After taking some times to look at the code, I have some lower-level ideas: It looks possible to support C++20 modules in BackgroundIndex.

  1. We need to introduce a new class to describe to dependencies information between named modules. And it looks OK to make the class only visible to BackgroundIndex . We can use clang-scan-deps to initialize new class.
  2. Then in BackgroundIndex::changedFilesTask, if the file is a module interface unit, we will try to generate the module interface into a special cache maintained by clangd first before we index the file actually. It requires clangd to rewrite the command line to emit the BMI to other places. This is also helpful if we want to emit different BMI someday.
  3. When we index source files, we need to rewrite the command line to lookup the BMI we generated at first. This is necessary to not breaking our normal build.

In my mind, it looks like it is workable. The version is not locked and it is possible to work with GCC and MSVC. Also there are some problems in the compiler side. e.g., the size of BMI generated by clang is too big now. Maybe this may break the cache. But I feel it OK as the first step.

Also There are some higher-level issues:
(1) When the clangd server initializes, will BackgroundIndex::changedFilesTask get called? From the name it looks like it won't be called. But I can't find other APIs.
(2) The idea above requires us always to scan all sources at first, which sounds a little bit scary since it only wastes time for non-module projects. The only method I got now is to add an option. This option is disabled by default now and it can be enabled by default someday if modules become popular.

@HighCommander4
Copy link
Contributor

And what is good as the first step? I think it may be good if:

  • We can collect the potential module files by clang-scan-deps in the project tree.

  • We can build these module files implicitly in order.

  • We can get the AST we just built.

How do you guys think about this? @HighCommander4

My perspective as a user is that yes, this level of support for modules would already be useful and valuable. I will defer to @sam-mccall for a maintainer's perspective of what would be accepted as a contribution. (I would imagine that it's fine to have various limitations especially if the support is behind an option/flag.)

A few thoughts/questions:

  • To make the performance acceptable, we'll want to persist BMIs rather than rebuilding them each time we encounter a module. (I think all previous comments assume this, I'm just making it explicit.)
  • To avoid stepping on the project's actual build (which may use a different compiler or a different version of clang), we should persist our BMIs in a clangd-specific place (such as .cache/clangd), like you said in the previous comment.
  • Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files? When you open a source file that imports a module M, the BMI for M may or may not have already been built by the indexer. If it hasn't, I imagine the AST (preamble) build should build it on the spot (and then the indexer won't need to build it again when it gets around to indexing a source file that imports M), but if the indexer has already built M's BMI then I imagine the preamble build can just read it.

@HighCommander4
Copy link
Contributor

(1) When the clangd server initializes, will BackgroundIndex::changedFilesTask get called?

Yes, when the compilation database is loaded, it calls this function with all of the entries in the database, which will call changedFilesTask().

@ChuanqiXu9
Copy link

Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files?

I am not pretty sure that I've understood the idea of the indexer. My current imagination is to let the TUScheduler to hold the ModulesDependencyGraph. And when TUScheduler::update is called, check if the file has any modules dependencies, if it has such dependencies and the BMI doesn't exist, don't call ASTWorker::update now and assign a task to AsyncRunner to build the BMI and call the ASTWorker::update then.

Now it is OK roughly for the a-b test case in llvm/llvm-project#58723.

I feel like the there is only one TUScheduler in one project (is this true?) So the BMIs won't be rebuilt every time we opened a new file.

But I feel I missed some points since I didn't touch the indexer at all. I initially want to work in BackgroundIndex but the log shows it is not reached after I open a new project even if I add -background-index. Or is it OK to not make the indexer know about the modules?

@HighCommander4
Copy link
Contributor

HighCommander4 commented Jun 2, 2023

I feel like the there is only one TUScheduler in one project (is this true?)

I believe so.

So the BMIs won't be rebuilt every time we opened a new file.

Note, I was also thinking of persisting the BMIs between clangd invocations.

is it OK to not make the indexer know about the modules?

The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter imports, so I guess it should know about modules as well.

@ChuanqiXu9
Copy link

Note, I was also thinking of persisting the BMIs between clangd invocations.

It makes sense. I feel like it is OK

The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter imports, so I guess it should know about modules as well.

My thought is that it works as long as we rewrote the command line to refer to the new BMIs we built.

@HighCommander4
Copy link
Contributor

The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter imports, so I guess it should know about modules as well.

My thought is that it works as long as we rewrote the command line to refer to the new BMIs we built.

I think that would work as part of the solution. Another part of the solution would be to ensure that the indexing of source files only starts after the BMIs have been built.

Presumably, something should proactively drive the building of BMIs (not just when you open a source file that depends on it), so that the indexing of source files eventually gets unblocked. Maybe it makes sense to think of building the BMIs as the "first stage" of indexing, and indexing the source files as the "second stage"?

@ChuanqiXu9
Copy link

I think that would work as part of the solution. Another part of the solution would be to ensure that the indexing of source > files only starts after the BMIs have been built.

Presumably, something should proactively drive the building of BMIs (not just when you open a source file that depends on it), so that the indexing of source files eventually gets unblocked. Maybe it makes sense to think of building the BMIs as the "first stage" of indexing, and indexing the source files as the "second stage"?

Do you mean only indexing the source files after we built all the BMIs or after we finish the first stage (since it is possible that some BMI is not buildable)? Sounds like a good idea. It can ease the implementation.

@sam-mccall
Copy link
Member Author

A few random thoughts...

(Full disclosure: I did some brainstorming about this a few months ago and couldn't come up with anything good, hopefully y'all are more creative & understand modules better!)

  • module building looks kinda like indexing, but i think focusing on background indexing is a red herring, and preamble building/indexing is where we should begin. We need to be able to open a file and start using it without any significant fraction of the project having been indexed. This is the single most critical latency path and to hit it today (building preamble) we take shortcuts which are unacceptable for background indexing (e.g. skip-function-bodies). The
  • the project may be so large that we cannot enumerate the modules, let alone touch every file. If we want to rely on clang-scan-deps for small projects, that makes sense, but we should abstract it somehow with a query interface that scales (like CompilationDatabase)
  • the big performance win with modules is the ability to reuse modules between invocations e.g. share the overlapping parts of preambles rather than starting from scratch. However editing headers will still force us to rebuild all dependent modules/preambles as modules are tightly coupled to their dependencies. (If you built a.pcm#1 against b.pcm#5 you can't reuse it with b.pcm#6).
  • we'll need some new abstractions, which are always hard to get right. TUScheduler is basically "full" in terms of complexity, we can't cram many more responsibilities in there without losing ability to reason about it. Obviously it will have to change, but the bulk of complexity should live somewhere else.
  • "modules" can be enabled in (too) many ways, and ultimately the design needs to accommodate as many as possible (there's no room for several implementations!). But we should be clear where we're starting (e.g. only fully-explicit command lines with -fmodule-file=foo=bar using c++20 syntax?) and roughly how other variants will fit in (can we translate implicit to explicit using clang-scan-deps? Do module maps make any difference?)

@RedBeard0531
Copy link

RedBeard0531 commented Jun 2, 2023

(note: I know nothing about clangd's internals so this comment is based on guesses based on observable behavior. Hopefully it isn't complete nonsense and if needed someone can translate into the correct terminology)

I assume clangd must already have an implementation of a proto-buildsystem internally. For example, it can schedule indexing tasks with a controlled amount of parallelism and trigger reindexing of downstream TUs when a header-file changes. Ideally it also notices when files change outside the editor (eg git) or when the editor is closed and is able to do incremental reindexing where needed (although I've experienced issues around this in the past, less so recently).

I think part of the solution is to double-down on having an integrated build system. Either continue improving the integrated one, or switch to something like llbuild. The key thing that I don't think was required before was task dependencies. While you did have source/header file -> TU indexer job dependencies, they merely triggered rerunning the task. Now you will need some indexer tasks to have an input dependency on the BMI which will be generated by another task. And it is important to rerun the downstream indexers whenever an upstream BMI is meaningfully changed (defining "meaningfully" can be tricky; a first pass can just treat any content difference as meaningful, but I suspect that can be improved on).

Additionally you cannot begin doing (much of) anything prior to scanning to determine module names and import requirements. And of course, when a file changes, it must be rescanned because it may have added an import, or changed which module/partition it declares itself to be part of. Files can also go from not generating a BMI to generating one eg if you add a export module foo; line or change module blah; to module blah:blah;

I don't know how easy or hard it would be to fit the concept of task dependencies into the existing proto-buildsystem, but at least from an outsider's perspective, that seems like the best (of not only) option. Or maybe I've just spent too much time working on buildsystems, so they feel like the solution to every problem now... 😄

@sam-mccall sam-mccall reopened this Jun 2, 2023
@sam-mccall
Copy link
Member Author

(sorry, hit the wrong button)

Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files?

The "lite mode" ASTs (skip-function-bodies etc) may not be suitable for indexing, and vice versa (even opportunistically: full ASTs are much larger => slower to load). I believe we can't mix the two due to diamond dependency problems. They may also have conflicting requirements for how to deal with staleness. If these can be solved, it would be fantastic.

I assume clangd must already have an implementation of a proto-buildsystem internally. For example, it can schedule indexing tasks

A very simple one for background indexing (no dependencies, as you note). However very deliberately no core functionality depends on this: the index is entirely optional and all features work (though degraded) if it's stale.

If we're going to have a build system we block on (and modules ~requires it) then it needs to be something new. It's going to be responsible for the critical path to user-facing latency, as opposed to background index's "process an infinite amount of non-critical batch work".

DanShaders pushed a commit to DanShaders/llvm-project that referenced this issue Apr 13, 2024
Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas. We want to have some initial support in
clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
> try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
> build the preamble.

And this patch reflects the above opinions.
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Jul 9, 2024
Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas. We want to have some initial support in
clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
> try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
> build the preamble.

And this patch reflects the above opinions.
ChuanqiXu9 added a commit to llvm/llvm-project that referenced this issue Jul 18, 2024
#66462)

Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas and the consensus so far. We want to have
some initial support in clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
 to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
    > try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
    > build the preamble.

This patch reflects the above opinions.

# Testing in real-world project

I tested this with a modularized library:
https://github.com/alibaba/async_simple/tree/CXX20Modules. This library
has 3 modules (async_simple, std and asio) and 65 module units. (Note
that a module consists of multiple module units). Both `std` module and
`asio` module have 100k+ lines of code (maybe more, I didn't count). And
async_simple itself has 8k lines of code. This is the scale of the
project.

The result shows that it works pretty well, ..., well, except I need to
wait roughly 10s after opening/editing any file. And this falls in our
expectations. We know it is hard to make it perfect in the first move.

# What this patch does in detail

- Introduced an option `--experimental-modules-support` for the support
for C++20 Modules. So that no matter how bad this is, it wouldn't affect
current users. Following off the page, we'll assume the option is
enabled.
- Introduced two classes `ModuleFilesInfo` and
`ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by
`ModuleFilesInfo`.
- The class `ModuleFilesInfo` records the built module files for
specific single source file. The module files can only be built by the
static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef
File, ...)`.
- The class `PreambleData` adds a new member variable with type
`ModuleFilesInfo`. This refers to the needed module files for the
current file. It means the module files info is part of the preamble,
which is suggested in the first patch too.
- In `isPreambleCompatible()`, we add a call to
`ModuleFilesInfo::CanReuse()` to check if the built module files are
still up to date.
- When we build the AST for a source file, we will load the built module
files from ModuleFilesInfo.

# What we need to do next

Let's split the TODOs into clang part and clangd part to make things
more clear.

The TODOs in the clangd part include:
1. Enable reusing module files across source files. The may require us
to bring a ModulesManager like thing which need to handle `scheduling`,
`the possibility of BMI version conflicts` and `various events that can
invalidate the module graph`.
2. Get a more efficient method to get the `<module-name> ->
<module-unit-source>` map. Currently we always scan the whole project
during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`.
This is clearly inefficient even if the scanning process is pretty fast.
I think the potential solutions include:
- Make a global scanner to monitor the state of every source file like I
did in the first patch. The pain point is that we need to take care of
the data races.
- Ask the build systems to provide the map just like we ask them to
provide the compilation database.
3. Persist the module files. So that we can reuse module files across
clangd invocations or even across clangd instances.

TODOs in the clang part include:
1. Clang should offer an option/mode to skip writing/reading the bodies
of the functions. Or even if we can requrie the parser to skip parsing
the function bodies.

And it looks like we can say the support for C++20 Modules is initially
workable after we made (1) and (2) (or even without (2)).
yuxuanchen1997 pushed a commit to llvm/llvm-project that referenced this issue Jul 25, 2024
#66462)

Summary:
Alternatives to https://reviews.llvm.org/D153114.

Try to address clangd/clangd#1293.

See the links for design ideas and the consensus so far. We want to have
some initial support in clang18.

This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
 to review and understand so that every one are in the same page:

> Don't attempt any cross-file or cross-version coordination: i.e. don't
    > try to reuse BMIs between different files, don't try to reuse BMIs
> between (preamble) reparses of the same file, don't try to persist the
> module graph. Instead, when building a preamble, synchronously scan
> for the module graph, build the required PCMs on the single preamble
> thread with filenames private to that preamble, and then proceed to
    > build the preamble.

This patch reflects the above opinions.

# Testing in real-world project

I tested this with a modularized library:
https://github.com/alibaba/async_simple/tree/CXX20Modules. This library
has 3 modules (async_simple, std and asio) and 65 module units. (Note
that a module consists of multiple module units). Both `std` module and
`asio` module have 100k+ lines of code (maybe more, I didn't count). And
async_simple itself has 8k lines of code. This is the scale of the
project.

The result shows that it works pretty well, ..., well, except I need to
wait roughly 10s after opening/editing any file. And this falls in our
expectations. We know it is hard to make it perfect in the first move.

# What this patch does in detail

- Introduced an option `--experimental-modules-support` for the support
for C++20 Modules. So that no matter how bad this is, it wouldn't affect
current users. Following off the page, we'll assume the option is
enabled.
- Introduced two classes `ModuleFilesInfo` and
`ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by
`ModuleFilesInfo`.
- The class `ModuleFilesInfo` records the built module files for
specific single source file. The module files can only be built by the
static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef
File, ...)`.
- The class `PreambleData` adds a new member variable with type
`ModuleFilesInfo`. This refers to the needed module files for the
current file. It means the module files info is part of the preamble,
which is suggested in the first patch too.
- In `isPreambleCompatible()`, we add a call to
`ModuleFilesInfo::CanReuse()` to check if the built module files are
still up to date.
- When we build the AST for a source file, we will load the built module
files from ModuleFilesInfo.

# What we need to do next

Let's split the TODOs into clang part and clangd part to make things
more clear.

The TODOs in the clangd part include:
1. Enable reusing module files across source files. The may require us
to bring a ModulesManager like thing which need to handle `scheduling`,
`the possibility of BMI version conflicts` and `various events that can
invalidate the module graph`.
2. Get a more efficient method to get the `<module-name> ->
<module-unit-source>` map. Currently we always scan the whole project
during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`.
This is clearly inefficient even if the scanning process is pretty fast.
I think the potential solutions include:
- Make a global scanner to monitor the state of every source file like I
did in the first patch. The pain point is that we need to take care of
the data races.
- Ask the build systems to provide the map just like we ask them to
provide the compilation database.
3. Persist the module files. So that we can reuse module files across
clangd invocations or even across clangd instances.

TODOs in the clang part include:
1. Clang should offer an option/mode to skip writing/reading the bodies
of the functions. Or even if we can requrie the parser to skip parsing
the function bodies.

And it looks like we can say the support for C++20 Modules is initially
workable after we made (1) and (2) (or even without (2)).

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60250983
@shadowndacorner
Copy link

Any updates on this? This is one of the last things making me hesitate to adopt modules in my projects as I make heavy use of clangd in my workflow.

@ChuanqiXu9
Copy link

Any updates on this? This is one of the last things making me hesitate to adopt modules in my projects as I make heavy use of clangd in my workflow.

We have experimental support now. You can try to enable it by experimental-modules-support and provide the compilation database.

@kamrann
Copy link

kamrann commented Oct 31, 2024

I tried this yesterday and immediately ran into the issue noted in #1105 (note: no assertion, but the reported error about mismatching compiler flag).

@ChuanqiXu9
Copy link

I tried this yesterday and immediately ran into the issue noted in #1105 (note: no assertion, but the reported error about mismatching compiler flag).

Are you meeting the same error? Where is that flag coming from?

@kamrann
Copy link

kamrann commented Oct 31, 2024

According to the linked issue, clangd is/was adding that flag automatically. It's not specified in my build, nor in the compile_commands.json.

So it looks as if perhaps whatever was triggering the assert in that issue got addressed at some stage, but the linked patch for the mismatch fix (which I suppose was just saying "a mismatch in this flag need not prevent module load") never got merged?

@ChuanqiXu9
Copy link

ChuanqiXu9 commented Oct 31, 2024

According to the linked issue, clangd is/was adding that flag automatically. It's not specified in my build, nor in the compile_commands.json.

So it looks as if perhaps whatever was triggering the assert in that issue got addressed at some stage, but the linked patch for the mismatch fix (which I suppose was just saying "a mismatch in this flag need not prevent module load") never got merged?

Oh, just noticed that this was the issue 2 years ago. I will do it.

BTW, in such cases, maybe you can workaround it by adding -Xclang -fno-validate-pch

@kamrann
Copy link

kamrann commented Oct 31, 2024

BTW, in such cases, maybe you can workaround it by adding -Xclang fno-validate-pch

Good to know, thanks!

ChuanqiXu9 added a commit to llvm/llvm-project that referenced this issue Oct 31, 2024
…ptions

Motivated by comments in clangd/clangd#1293

And RetainCommentsFromSystemHeaders shouldn't affect the compatibleness
anyway.
@ChuanqiXu9
Copy link

According to the linked issue, clangd is/was adding that flag automatically. It's not specified in my build, nor in the compile_commands.json.

So it looks as if perhaps whatever was triggering the assert in that issue got addressed at some stage, but the linked patch for the mismatch fix (which I suppose was just saying "a mismatch in this flag need not prevent module load") never got merged?

FYI, done in llvm/llvm-project@d5cdc26

smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this issue Nov 3, 2024
…ptions

Motivated by comments in clangd/clangd#1293

And RetainCommentsFromSystemHeaders shouldn't affect the compatibleness
anyway.
@Decodetalkers
Copy link

Seems clangd now only works with the module build of clang, and does not support gcc.. Emm

NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this issue Nov 4, 2024
…ptions

Motivated by comments in clangd/clangd#1293

And RetainCommentsFromSystemHeaders shouldn't affect the compatibleness
anyway.
@mathstuf
Copy link

mathstuf commented Nov 4, 2024

Seems clangd now only works with the module build of clang, and does not support gcc.. Emm

Yes, this makes sense to me as clang cannot read gcc's module files. clangd would need to make its own BMIs rather than relying on what the build would already make.

@onion108
Copy link

Somehow triggered invalid AST when using modules and clangd just stopped working (no completion & no diagnostics & no go-to etc.)

I[21:10:51.149] Indexing c++23 standard library in the context of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc
E[21:10:51.156] Failed to scan baseline of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc: failed BeginSourceFile
I[21:10:51.162] BeginSourceFile() failed when building AST for /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc

Found this in the log, and after that all LSP requests were responded with "invalid AST" and code -32001

@onion108
Copy link

Somehow triggered invalid AST when using modules and clangd just stopped working (no completion & no diagnostics & no go-to etc.)

I[21:10:51.149] Indexing c++23 standard library in the context of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc
E[21:10:51.156] Failed to scan baseline of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc: failed BeginSourceFile
I[21:10:51.162] BeginSourceFile() failed when building AST for /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc

Found this in the log, and after that all LSP requests were responded with "invalid AST" and code -32001

Oh it turned out that it's because I'm too dumb and forgot to update compile_commands.json when adding -Xclang -fretain-comments-from-system-headers.

Speaking of "-fretain-comments-from-system-headers", it seems that neither clang --help nor clang -cc1 --help tells me about this option. I waste a lot of time finding what the option is called because of it.

@ChuanqiXu9
Copy link

Somehow triggered invalid AST when using modules and clangd just stopped working (no completion & no diagnostics & no go-to etc.)

I[21:10:51.149] Indexing c++23 standard library in the context of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc
E[21:10:51.156] Failed to scan baseline of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc: failed BeginSourceFile
I[21:10:51.162] BeginSourceFile() failed when building AST for /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc

Found this in the log, and after that all LSP requests were responded with "invalid AST" and code -32001

Oh it turned out that it's because I'm too dumb and forgot to update compile_commands.json when adding -Xclang -fretain-comments-from-system-headers.

Speaking of "-fretain-comments-from-system-headers", it seems that neither clang --help nor clang -cc1 --help tells me about this option. I waste a lot of time finding what the option is called because of it.

This might be fixed in llvm/llvm-project@d5cdc26

@onion108
Copy link

Somehow triggered invalid AST when using modules and clangd just stopped working (no completion & no diagnostics & no go-to etc.)

I[21:10:51.149] Indexing c++23 standard library in the context of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc
E[21:10:51.156] Failed to scan baseline of /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc: failed BeginSourceFile
I[21:10:51.162] BeginSourceFile() failed when building AST for /Users/onion27/Workspace/Programming/Lab/mommy-i-wanna-use-cpp-modules/main.cc

Found this in the log, and after that all LSP requests were responded with "invalid AST" and code -32001

Oh it turned out that it's because I'm too dumb and forgot to update compile_commands.json when adding -Xclang -fretain-comments-from-system-headers.
Speaking of "-fretain-comments-from-system-headers", it seems that neither clang --help nor clang -cc1 --help tells me about this option. I waste a lot of time finding what the option is called because of it.

This might be fixed in llvm/llvm-project@d5cdc26

Yeah it was fixed but what makes me struggle is that the option -fretain-comments-from-system-headers doesn't appear in help message of either clang or clang -cc1, and I put a lot of effort finding that option and adds to my build script.
スクリーンショット 2024-12-28 14 08 13
(When I'm typing "/retain documentation comments" trying to search in the help but nothing matches. The option does exist on my compiler though because I put -fretain-comments-from-system-headers and it just worked. But it's still a little bit confusing 'cause when I can't search anything about the flag in the help I almost think that I have the wrong version of the compiler, but it turned out that the option actually exists but somehow doesn't appear in the help)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests