Skip to content

Parallelize module loading in POSIX dyld code #130912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 31, 2025

Conversation

zhyty
Copy link
Contributor

@zhyty zhyty commented Mar 12, 2025

This patch improves LLDB launch time on Linux machines for preload scenarios, particularly for executables with a lot of shared library dependencies (or modules). Specifically:

  • Launching a binary with target.preload-symbols = true
  • Attaching to a process with target.preload-symbols = true.
    It's completely controlled by a new flag added in the first commit plugin.dynamic-loader.posix-dyld.parallel-module-load, which defaults to false. This was inspired by similar work on Darwin DynamicLoaderDarwin load images in parallel with preload #110646.

Some rough numbers to showcase perf improvement, run on a very beefy machine:

  • Executable with ~5600 modules: baseline 45s, improvement 15s
  • Executable with ~3800 modules: baseline 25s, improvement 10s
  • Executable with ~6650 modules: baseline 67s, improvement 20s
  • Executable with ~12500 modules: baseline 185s, improvement 85s
  • Executable with ~14700 modules: baseline 235s, improvement 120s
    A lot of targets we deal with have a ton of modules, and unfortunately we're unable to convince other folks to reduce the number of modules, so performance improvements like this can be very impactful for user experience.

This patch achieves the performance improvement by parallelizing DynamicLoaderPOSIXDYLD::RefreshModules for the launch scenario, and DynamicLoaderPOSIXDYLD::LoadAllCurrentModules for the attach scenario. The commits have some context on their specific changes as well -- hopefully this helps the review.

More context on implementation

We discovered the bottlenecks by via perf record -g -p <lldb's pid> on a Linux machine. With an executable known to have 1000s of shared library dependencies, I ran

(lldb) b main
(lldb) r
# taking a while

and showed the resulting perf trace (snippet shown)

Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812
  Children      Self  Command          Shared Object              Symbol
-   93.54%     0.00%  intern-state     libc.so.6                  [.] clone3
     clone3
     start_thread
     lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*)                                                                           r
     std::_Function_handler<void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0>::_M_invoke(std::_Any_data const&)
     lldb_private::Process::RunPrivateStateThread(bool)                                                                                          n
   - lldb_private::Process::HandlePrivateEvent(std::shared_ptr<lldb_private::Event>&)
      - 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*)
         - 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*)
            - lldb_private::Thread::ShouldStop(lldb_private::Event*)                                                                             *
               - 93.53% lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*)                                            t
                  - 93.52% lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*)                                     i
                       lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*)                           k
                       lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*)                                     b
                       lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*, unsigned long, unsigned long)    i
                       DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo
                     - DynamicLoaderPOSIXDYLD::RefreshModules()                                                                                  O
                        - 93.42% DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry const&) const                 u
                           - 93.40% DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, bools
                              - lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long, unsigned long, boos
                                 - 83.90% lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&)                        o
                                    - 83.01% lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&, bool, lldb_private::Status*
                                       - 77.89% lldb_private::Module::PreloadSymbols()
                                          - 44.06% lldb_private::Symtab::PreloadSymbols()
                                             - 43.66% lldb_private::Symtab::InitNameIndexes()
...

We saw that majority of time was spent in RefreshModules, with the main culprit within it LoadModuleAtAddress which eventually calls PreloadSymbols.

At first, DynamicLoaderPOSIXDYLD::LoadModuleAtAddress appears fairly independent -- most of it deals with different files and then getting or creating Modules from these files. The portions that aren't independent seem to deal with ModuleLists, which appear concurrency safe. There were members of DynamicLoaderPOSIXDYLD I had to synchronize though: namely m_loaded_modules which DynamicLoaderPOSIXDYLD maintains to map its loaded modules to their link addresses. Without synchronizing this, I ran into SEGFAULTS and other issues when running check-lldb. I also locked the assignment and comparison of m_interpreter_module, which may be unnecessary.

Alternate implementations

When creating this patch, another implementation I considered was directly background-ing the call to Module::PreloadSymbol in Target::GetOrCreateModule. It would have the added benefit of working across platforms generically, and appeared to be concurrency safe. It was done via Debugger::GetThreadPool().async directly. However, there were a ton of concurrency issues, so I abandoned that approach for now.

Testing

With the feature active, I tested via ninja check-lldb on both Debug and Release builds several times (~5 or 6 altogether?), and didn't spot additional failing or flaky tests.

I also tested manually on several different binaries, some with around 14000 modules, but just basic operations: launching, reaching main, setting breakpoint, stepping, showing some backtraces.

I've also tested with the flag off just to make sure things behave properly synchronously.

Add a setting to gate the new parallel dynamic library loading feature in the next diff. I followed the example of `lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp` and `lldb/source/Plugins/StructuredData/DarwinLog/StructuredDataDarwinLog.cpp` to create a scoped setting for the new feature.

NOTE: this setting defaults to FALSE.

Users should be able to enable the new feature as easily as adding a line to their .lldbinit file, or manually setting it in their session.

```
(lldb) apropos "POSIX dynamic"
No commands found pertaining to 'POSIX dynamic'. Try 'help' to see a complete list of debugger commands.

The following settings variables may relate to 'POSIX dynamic':
  plugin.dynamic-loader.posix-dyld.parallel-module-load -- Enable experimental loading of modules in parallel for the POSIX dynamic loader.
(lldb) settings show plugin.dynamic-loader.posix-dyld.parallel-module-load
plugin.dynamic-loader.posix-dyld.parallel-module-load (boolean) = false
(lldb) settings set plugin.dynamic-loader.posix-dyld.parallel-module-load true
(lldb) settings show plugin.dynamic-loader.posix-dyld.parallel-module-load
plugin.dynamic-loader.posix-dyld.parallel-module-load (boolean) = true
```
@zhyty zhyty force-pushed the posix-parallel-dyld branch from b7ddeb2 to c216836 Compare March 12, 2025 07:14
Copy link

github-actions bot commented Mar 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Tom Yang added 2 commits March 12, 2025 00:32
This diff parallelizes `DynamicLoaderPOSIXDYLD::RefreshModules`, which speeds up module loading on Linux. The major benefit of this is we can speed up symbol table indexing and parsing, which is the biggest bottleneck for targets which dynamically link many shared libraries.

This speedup is only noticeable when **preloading** symbols. This is when `target.preload-symbols` is `true`, which is the default Meta. The symbol preload option tells the debugger to fully load all of the symbol tables when modules are loaded, as opposed to lazily loading when symbols are requested.

Initially, I discovered the specific bottleneck by using the Linux `perf` tool. I saw that ~93% of samples were in `RefreshModules`, and mainly in `LoadModuleAtAddress` and `PreloadSymbols`.

`LoadModuleAtAddress` appears independent and parallelize-able at first. The main issue is `DynamicLoaderPOSIXDYLD` maintains a map of loaded modules to their link addresses via `m_loaded_modules`. Modifying and reading to this map isn't thread-safe, so this diff also includes accessor methods that protect the map in the multithreaded context. Luckily, the critical section of modifying or reading from the map isn't super costly, so the contention doesn't appear to negatively impact performance.

I tested with some larger projects with up to 15000 modules, and
found significant performance improvements. Typically, I was seeing 2-3X
launch speed increases, where "launch speed" is starting the binary and
reaching `main`.

I manually ran `ninja check-lldb` several times, and compared with the baseline. At this point, we're not seeing any new failures or new unresolved tests.
@zhyty zhyty force-pushed the posix-parallel-dyld branch from c216836 to 4a4a355 Compare March 12, 2025 07:33
@zhyty zhyty requested review from jeffreytan81 and clayborg March 12, 2025 17:11
@clayborg clayborg requested a review from labath March 12, 2025 20:39
@zhyty zhyty marked this pull request as ready for review March 13, 2025 21:38
@zhyty zhyty requested a review from JDevlieghere as a code owner March 13, 2025 21:38
@llvmbot llvmbot added the lldb label Mar 13, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 13, 2025

@llvm/pr-subscribers-lldb

Author: Tom Yang (zhyty)

Changes

This patch improves LLDB launch time on Linux machines for preload scenarios, particularly for executables with a lot of shared library dependencies (or modules). Specifically:

  • Launching a binary with target.preload-symbols = true
  • Attaching to a process with target.preload-symbols = true.
    It's completely controlled by a new flag added in the first commit plugin.dynamic-loader.posix-dyld.parallel-module-load, which defaults to false.

Some rough numbers to showcase perf improvement, run on a very beefy machine:

  • Executable with ~5600 modules: baseline 45s, improvement 15s
  • Executable with ~3800 modules: baseline 25s, improvement 10s
  • Executable with ~6650 modules: baseline 67s, improvement 20s
  • Executable with ~12500 modules: baseline 185s, improvement 85s
  • Executable with ~14700 modules: baseline 235s, improvement 120s
    A lot of targets we deal with have a ton of modules, and unfortunately we're unable to convince other folks to reduce the number of modules, so performance improvements like this can be very impactful for user experience.

This patch achieves the performance improvement by parallelizing DynamicLoaderPOSIXDYLD::RefreshModules for the launch scenario, and DynamicLoaderPOSIXDYLD::LoadAllCurrentModules for the attach scenario. The commits have some context on their specific changes as well -- hopefully this helps the review.

More context on implementation

We discovered the bottlenecks by via perf record -g -p &lt;lldb's pid&gt; on a Linux machine. With an executable known to have 1000s of shared library dependencies, I ran

(lldb) b main
(lldb) r
# taking a while

and showed the resulting perf trace (snippet shown)

Samples: 85K of event 'cycles:P', Event count (approx.): 54615855812
  Children      Self  Command          Shared Object              Symbol
-   93.54%     0.00%  intern-state     libc.so.6                  [.] clone3
     clone3
     start_thread
     lldb_private::HostNativeThreadBase::ThreadCreateTrampoline(void*)                                                                           r
     std::_Function_handler&lt;void* (), lldb_private::Process::StartPrivateStateThread(bool)::$_0&gt;::_M_invoke(std::_Any_data const&amp;)
     lldb_private::Process::RunPrivateStateThread(bool)                                                                                          n
   - lldb_private::Process::HandlePrivateEvent(std::shared_ptr&lt;lldb_private::Event&gt;&amp;)
      - 93.54% lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*)
         - 93.54% lldb_private::ThreadList::ShouldStop(lldb_private::Event*)
            - lldb_private::Thread::ShouldStop(lldb_private::Event*)                                                                             *
               - 93.53% lldb_private::StopInfoBreakpoint::ShouldStopSynchronous(lldb_private::Event*)                                            t
                  - 93.52% lldb_private::BreakpointSite::ShouldStop(lldb_private::StoppointCallbackContext*)                                     i
                       lldb_private::BreakpointLocationCollection::ShouldStop(lldb_private::StoppointCallbackContext*)                           k
                       lldb_private::BreakpointLocation::ShouldStop(lldb_private::StoppointCallbackContext*)                                     b
                       lldb_private::BreakpointOptions::InvokeCallback(lldb_private::StoppointCallbackContext*, unsigned long, unsigned long)    i
                       DynamicLoaderPOSIXDYLD::RendezvousBreakpointHit(void*, lldb_private::StoppointCallbackContext*, unsigned long, unsigned lo
                     - DynamicLoaderPOSIXDYLD::RefreshModules()                                                                                  O
                        - 93.42% DynamicLoaderPOSIXDYLD::RefreshModules()::$_0::operator()(DYLDRendezvous::SOEntry const&amp;) const                 u
                           - 93.40% DynamicLoaderPOSIXDYLD::LoadModuleAtAddress(lldb_private::FileSpec const&amp;, unsigned long, unsigned long, bools
                              - lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&amp;, unsigned long, unsigned long, boos
                                 - 83.90% lldb_private::DynamicLoader::FindModuleViaTarget(lldb_private::FileSpec const&amp;)                        o
                                    - 83.01% lldb_private::Target::GetOrCreateModule(lldb_private::ModuleSpec const&amp;, bool, lldb_private::Status*
                                       - 77.89% lldb_private::Module::PreloadSymbols()
                                          - 44.06% lldb_private::Symtab::PreloadSymbols()
                                             - 43.66% lldb_private::Symtab::InitNameIndexes()
...

We saw that majority of time was spent in RefreshModules, with the main culprit within it LoadModuleAtAddress which eventually calls PreloadSymbols.

At first, DynamicLoaderPOSIXDYLD::LoadModuleAtAddress appears fairly independent -- most of it deals with different files and then getting or creating Modules from these files. The portions that aren't independent seem to deal with ModuleLists, which appear concurrency safe. There were members of DynamicLoaderPOSIXDYLD I had to synchronize though: namely m_loaded_modules which DynamicLoaderPOSIXDYLD maintains to map its loaded modules to their link addresses. Without synchronizing this, I ran into SEGFAULTS and other issues when running check-lldb. I also locked the assignment and comparison of m_interpreter_module, which may be unnecessary.

Alternate implementations

When creating this patch, another implementation I considered was directly background-ing the call to Module::PreloadSymbol in Target::GetOrCreateModule. It would have the added benefit of working across platforms generically, and appeared to be concurrency safe. It was done via Debugger::GetThreadPool().async directly. However, there were a ton of concurrency issues, so I abandoned that approach for now.

Testing

With the feature active, I tested via ninja check-lldb on both Debug and Release builds several times (~5 or 6 altogether?), and didn't spot additional failing or flaky tests.

I also tested manually on several different binaries, some with around 14000 modules, but just basic operations: launching, reaching main, setting breakpoint, stepping, showing some backtraces.

I've also tested with the flag off just to make sure things behave properly synchronously.


Full diff: https://github.com/llvm/llvm-project/pull/130912.diff

4 Files Affected:

  • (modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/CMakeLists.txt (+12)
  • (modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp (+153-39)
  • (modified) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h (+15-4)
  • (added) lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLDProperties.td (+8)
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/CMakeLists.txt b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/CMakeLists.txt
index c1e00b2dd444f..532bfb20ea0f1 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/CMakeLists.txt
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/CMakeLists.txt
@@ -1,3 +1,11 @@
+lldb_tablegen(DynamicLoaderPOSIXDYLDProperties.inc -gen-lldb-property-defs
+  SOURCE DynamicLoaderPOSIXDYLDProperties.td
+  TARGET LLDBPluginDynamicLoaderPOSIXDYLDPropertiesGen)
+
+lldb_tablegen(DynamicLoaderPOSIXDYLDPropertiesEnum.inc -gen-lldb-property-enum-defs
+  SOURCE DynamicLoaderPOSIXDYLDProperties.td
+  TARGET LLDBPluginDynamicLoaderPOSIXDYLDPropertiesEnumGen)
+
 add_lldb_library(lldbPluginDynamicLoaderPosixDYLD PLUGIN
   DYLDRendezvous.cpp
   DynamicLoaderPOSIXDYLD.cpp
@@ -13,3 +21,7 @@ add_lldb_library(lldbPluginDynamicLoaderPosixDYLD PLUGIN
   LINK_COMPONENTS
     Support
   )
+
+add_dependencies(lldbPluginDynamicLoaderPosixDYLD
+  LLDBPluginDynamicLoaderPOSIXDYLDPropertiesGen
+  LLDBPluginDynamicLoaderPOSIXDYLDPropertiesEnumGen)
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
index 53ba11ac21bd3..f6f6057f830fe 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.cpp
@@ -10,6 +10,7 @@
 #include "DynamicLoaderPOSIXDYLD.h"
 
 #include "lldb/Breakpoint/BreakpointLocation.h"
+#include "lldb/Core/Debugger.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/ModuleSpec.h"
 #include "lldb/Core/PluginManager.h"
@@ -25,6 +26,7 @@
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
 #include "lldb/Utility/ProcessInfo.h"
+#include "llvm/Support/ThreadPool.h"
 
 #include <memory>
 #include <optional>
@@ -34,9 +36,56 @@ using namespace lldb_private;
 
 LLDB_PLUGIN_DEFINE_ADV(DynamicLoaderPOSIXDYLD, DynamicLoaderPosixDYLD)
 
+namespace {
+
+#define LLDB_PROPERTIES_dynamicloaderposixdyld
+#include "DynamicLoaderPOSIXDYLDProperties.inc"
+
+enum {
+#define LLDB_PROPERTIES_dynamicloaderposixdyld
+#include "DynamicLoaderPOSIXDYLDPropertiesEnum.inc"
+};
+
+class PluginProperties : public Properties {
+public:
+  static llvm::StringRef GetSettingName() {
+    return DynamicLoaderPOSIXDYLD::GetPluginNameStatic();
+  }
+
+  PluginProperties() : Properties() {
+    m_collection_sp = std::make_shared<OptionValueProperties>(GetSettingName());
+    m_collection_sp->Initialize(g_dynamicloaderposixdyld_properties);
+  }
+
+  ~PluginProperties() override = default;
+
+  bool GetParallelModuleLoad() const {
+    const uint32_t idx = ePropertyParallelModuleLoad;
+    return GetPropertyAtIndexAs<bool>(idx, true);
+  }
+};
+
+} // namespace
+
+static PluginProperties &GetGlobalPluginProperties() {
+  static PluginProperties g_settings;
+  return g_settings;
+}
+
 void DynamicLoaderPOSIXDYLD::Initialize() {
   PluginManager::RegisterPlugin(GetPluginNameStatic(),
-                                GetPluginDescriptionStatic(), CreateInstance);
+                                GetPluginDescriptionStatic(), CreateInstance,
+                                &DebuggerInitialize);
+}
+
+void DynamicLoaderPOSIXDYLD::DebuggerInitialize(Debugger &debugger) {
+  if (!PluginManager::GetSettingForDynamicLoaderPlugin(
+          debugger, PluginProperties::GetSettingName())) {
+    const bool is_global_setting = true;
+    PluginManager::CreateSettingForDynamicLoaderPlugin(
+        debugger, GetGlobalPluginProperties().GetValueProperties(),
+        "Properties for the POSIX dynamic loader plug-in.", is_global_setting);
+  }
 }
 
 void DynamicLoaderPOSIXDYLD::Terminate() {}
@@ -184,16 +233,37 @@ void DynamicLoaderPOSIXDYLD::DidLaunch() {
 
 Status DynamicLoaderPOSIXDYLD::CanLoadImage() { return Status(); }
 
+void DynamicLoaderPOSIXDYLD::SetLoadedModule(const ModuleSP &module_sp,
+                                             addr_t link_map_addr) {
+  std::unique_lock<std::shared_mutex> lock(m_loaded_modules_rw_mutex);
+  m_loaded_modules[module_sp] = link_map_addr;
+}
+
+void DynamicLoaderPOSIXDYLD::UnloadModule(const ModuleSP &module_sp) {
+  std::unique_lock<std::shared_mutex> lock(m_loaded_modules_rw_mutex);
+  m_loaded_modules.erase(module_sp);
+}
+
+std::optional<lldb::addr_t>
+DynamicLoaderPOSIXDYLD::GetLoadedModuleLinkAddr(const ModuleSP &module_sp) {
+  std::shared_lock<std::shared_mutex> lock(m_loaded_modules_rw_mutex);
+  auto it = m_loaded_modules.find(module_sp);
+  if (it != m_loaded_modules.end())
+    return it->second;
+  return std::nullopt;
+}
+
 void DynamicLoaderPOSIXDYLD::UpdateLoadedSections(ModuleSP module,
                                                   addr_t link_map_addr,
                                                   addr_t base_addr,
                                                   bool base_addr_is_offset) {
-  m_loaded_modules[module] = link_map_addr;
+  SetLoadedModule(module, link_map_addr);
+
   UpdateLoadedSectionsCommon(module, base_addr, base_addr_is_offset);
 }
 
 void DynamicLoaderPOSIXDYLD::UnloadSections(const ModuleSP module) {
-  m_loaded_modules.erase(module);
+  UnloadModule(module);
 
   UnloadSectionsCommon(module);
 }
@@ -401,7 +471,7 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
   // The rendezvous class doesn't enumerate the main module, so track that
   // ourselves here.
   ModuleSP executable = GetTargetExecutable();
-  m_loaded_modules[executable] = m_rendezvous.GetLinkMapAddress();
+  SetLoadedModule(executable, m_rendezvous.GetLinkMapAddress());
 
   DYLDRendezvous::iterator I;
   DYLDRendezvous::iterator E;
@@ -423,34 +493,66 @@ void DynamicLoaderPOSIXDYLD::RefreshModules() {
       E = m_rendezvous.end();
       m_initial_modules_added = true;
     }
-    for (; I != E; ++I) {
-      // Don't load a duplicate copy of ld.so if we have already loaded it
-      // earlier in LoadInterpreterModule. If we instead loaded then unloaded it
-      // later, the section information for ld.so would be removed. That
-      // information is required for placing breakpoints on Arm/Thumb systems.
-      if ((m_interpreter_module.lock() != nullptr) &&
-          (I->base_addr == m_interpreter_base))
-        continue;
-
-      ModuleSP module_sp =
-          LoadModuleAtAddress(I->file_spec, I->link_addr, I->base_addr, true);
-      if (!module_sp.get())
-        continue;
-
-      if (module_sp->GetObjectFile()->GetBaseAddress().GetLoadAddress(
-              &m_process->GetTarget()) == m_interpreter_base) {
-        ModuleSP interpreter_sp = m_interpreter_module.lock();
-        if (m_interpreter_module.lock() == nullptr) {
-          m_interpreter_module = module_sp;
-        } else if (module_sp == interpreter_sp) {
-          // Module already loaded.
-          continue;
-        }
-      }
 
-      loaded_modules.AppendIfNeeded(module_sp);
-      new_modules.Append(module_sp);
+    std::mutex interpreter_module_mutex;
+    // We should be able to take SOEntry as reference since the data
+    // exists for the duration of this call in `m_rendezvous`.
+    auto load_module_fn =
+        [this, &loaded_modules, &new_modules,
+         &interpreter_module_mutex](const DYLDRendezvous::SOEntry &so_entry) {
+          // Don't load a duplicate copy of ld.so if we have already loaded it
+          // earlier in LoadInterpreterModule. If we instead loaded then
+          // unloaded it later, the section information for ld.so would be
+          // removed. That information is required for placing breakpoints on
+          // Arm/Thumb systems.
+          {
+            // `m_interpreter_module` may be modified by another thread at the
+            // same time, so we guard the access here.
+            std::lock_guard<std::mutex> lock(interpreter_module_mutex);
+            if ((m_interpreter_module.lock() != nullptr) &&
+                (so_entry.base_addr == m_interpreter_base))
+              return;
+          }
+
+          ModuleSP module_sp = LoadModuleAtAddress(
+              so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
+          if (!module_sp.get())
+            return;
+
+          {
+            // `m_interpreter_module` may be modified by another thread at the
+            // same time, so we guard the access here.
+            std::lock_guard<std::mutex> lock(interpreter_module_mutex);
+            // Set the interpreter module, if this is the interpreter.
+            if (module_sp->GetObjectFile()->GetBaseAddress().GetLoadAddress(
+                    &m_process->GetTarget()) == m_interpreter_base) {
+              ModuleSP interpreter_sp = m_interpreter_module.lock();
+              if (m_interpreter_module.lock() == nullptr) {
+                m_interpreter_module = module_sp;
+              } else if (module_sp == interpreter_sp) {
+                // Module already loaded.
+                return;
+              }
+            }
+          }
+
+          loaded_modules.AppendIfNeeded(module_sp);
+          new_modules.Append(module_sp);
+        };
+
+    // Loading modules in parallel tends to be faster, but is still unstable.
+    // Once it's stable, we can remove this setting and remove the serial
+    // approach.
+    if (GetGlobalPluginProperties().GetParallelModuleLoad()) {
+      llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+      for (; I != E; ++I)
+        task_group.async(load_module_fn, *I);
+      task_group.wait();
+    } else {
+      for (; I != E; ++I)
+        load_module_fn(*I);
     }
+
     m_process->GetTarget().ModulesDidLoad(new_modules);
   }
 
@@ -636,7 +738,7 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
   // The rendezvous class doesn't enumerate the main module, so track that
   // ourselves here.
   ModuleSP executable = GetTargetExecutable();
-  m_loaded_modules[executable] = m_rendezvous.GetLinkMapAddress();
+  SetLoadedModule(executable, m_rendezvous.GetLinkMapAddress());
 
   std::vector<FileSpec> module_names;
   for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
@@ -644,19 +746,31 @@ void DynamicLoaderPOSIXDYLD::LoadAllCurrentModules() {
   m_process->PrefetchModuleSpecs(
       module_names, m_process->GetTarget().GetArchitecture().GetTriple());
 
-  for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
-    ModuleSP module_sp =
-        LoadModuleAtAddress(I->file_spec, I->link_addr, I->base_addr, true);
+  auto load_module_fn = [this, &module_list,
+                         &log](const DYLDRendezvous::SOEntry &so_entry) {
+    ModuleSP module_sp = LoadModuleAtAddress(
+        so_entry.file_spec, so_entry.link_addr, so_entry.base_addr, true);
     if (module_sp.get()) {
       LLDB_LOG(log, "LoadAllCurrentModules loading module: {0}",
-               I->file_spec.GetFilename());
+               so_entry.file_spec.GetFilename());
       module_list.Append(module_sp);
     } else {
       Log *log = GetLog(LLDBLog::DynamicLoader);
       LLDB_LOGF(
           log,
           "DynamicLoaderPOSIXDYLD::%s failed loading module %s at 0x%" PRIx64,
-          __FUNCTION__, I->file_spec.GetPath().c_str(), I->base_addr);
+          __FUNCTION__, so_entry.file_spec.GetPath().c_str(),
+          so_entry.base_addr);
+    }
+  };
+  if (GetGlobalPluginProperties().GetParallelModuleLoad()) {
+    llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+    for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I)
+      task_group.async(load_module_fn, *I);
+    task_group.wait();
+  } else {
+    for (I = m_rendezvous.begin(), E = m_rendezvous.end(); I != E; ++I) {
+      load_module_fn(*I);
     }
   }
 
@@ -728,15 +842,15 @@ DynamicLoaderPOSIXDYLD::GetThreadLocalData(const lldb::ModuleSP module_sp,
                                            const lldb::ThreadSP thread,
                                            lldb::addr_t tls_file_addr) {
   Log *log = GetLog(LLDBLog::DynamicLoader);
-  auto it = m_loaded_modules.find(module_sp);
-  if (it == m_loaded_modules.end()) {
+  std::optional<addr_t> link_map_addr_opt = GetLoadedModuleLinkAddr(module_sp);
+  if (!link_map_addr_opt.has_value()) {
     LLDB_LOGF(
         log, "GetThreadLocalData error: module(%s) not found in loaded modules",
         module_sp->GetObjectName().AsCString());
     return LLDB_INVALID_ADDRESS;
   }
 
-  addr_t link_map = it->second;
+  addr_t link_map = link_map_addr_opt.value();
   if (link_map == LLDB_INVALID_ADDRESS || link_map == 0) {
     LLDB_LOGF(log,
               "GetThreadLocalData error: invalid link map address=0x%" PRIx64,
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
index bde334aaca40b..183a157654ba2 100644
--- a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLD.h
@@ -28,6 +28,8 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
 
   static void Initialize();
 
+  static void DebuggerInitialize(lldb_private::Debugger &debugger);
+
   static void Terminate();
 
   static llvm::StringRef GetPluginNameStatic() { return "posix-dyld"; }
@@ -93,10 +95,6 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
   /// Contains the pointer to the interpret module, if loaded.
   std::weak_ptr<lldb_private::Module> m_interpreter_module;
 
-  /// Loaded module list. (link map for each module)
-  std::map<lldb::ModuleWP, lldb::addr_t, std::owner_less<lldb::ModuleWP>>
-      m_loaded_modules;
-
   /// Returns true if the process is for a core file.
   bool IsCoreFile() const;
 
@@ -180,6 +178,19 @@ class DynamicLoaderPOSIXDYLD : public lldb_private::DynamicLoader {
   DynamicLoaderPOSIXDYLD(const DynamicLoaderPOSIXDYLD &) = delete;
   const DynamicLoaderPOSIXDYLD &
   operator=(const DynamicLoaderPOSIXDYLD &) = delete;
+
+  /// Loaded module list. (link map for each module)
+  /// This may be accessed in a multi-threaded context. Use the accessor methods
+  /// to access `m_loaded_modules` safely.
+  std::map<lldb::ModuleWP, lldb::addr_t, std::owner_less<lldb::ModuleWP>>
+      m_loaded_modules;
+  std::shared_mutex m_loaded_modules_rw_mutex;
+
+  void SetLoadedModule(const lldb::ModuleSP &module_sp,
+                       lldb::addr_t link_map_addr);
+  void UnloadModule(const lldb::ModuleSP &module_sp);
+  std::optional<lldb::addr_t>
+  GetLoadedModuleLinkAddr(const lldb::ModuleSP &module_sp);
 };
 
 #endif // LLDB_SOURCE_PLUGINS_DYNAMICLOADER_POSIX_DYLD_DYNAMICLOADERPOSIXDYLD_H
diff --git a/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLDProperties.td b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLDProperties.td
new file mode 100644
index 0000000000000..68e3448196cec
--- /dev/null
+++ b/lldb/source/Plugins/DynamicLoader/POSIX-DYLD/DynamicLoaderPOSIXDYLDProperties.td
@@ -0,0 +1,8 @@
+
+include "../../../../include/lldb/Core/PropertiesBase.td"
+
+let Definition = "dynamicloaderposixdyld" in {
+  def ParallelModuleLoad: Property<"parallel-module-load", "Boolean">,
+    DefaultFalse,
+    Desc<"Enable experimental loading of modules in parallel for the POSIX dynamic loader.">;
+}

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the equivalent of #110646 but for Linux and the overall approach looks good to me.

My only real concern is about the setting and it being off by default. I would prefer that this is on by default and the setting is there as an escape hatch in case there's an unexpected issue. I'd like to understand what's holding that back?

Building on that, if the setting is on by default, then instead of creating a new setting per plugin, I'd like to hoist it into target and have one setting that controls this behavior for all of them. I expect it to be rare that you'd want to turn this off for one plugin but not the other. Unless we are a 100% sure we're going to take away the setting (in which case the setting should be experimental[1]), it's easier to keep it around, for example to make it easier to debug or quickly toggle it and see if a concurrency issue disappears.

[1] See plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load. The experimental prefix allows you to set that option without it existing for when it either gets promoted to a real setting (without the experimental) or alternative it goes away.

Comment on lines +539 to +540
loaded_modules.AppendIfNeeded(module_sp);
new_modules.Append(module_sp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we ever get into trouble because this operation isn't atomic? Both ModuleLists are thread safe, but is there a situation where another thread could see a module in one but not the other? Given that we only append to it in the lambda, I suspect it's fine. Maybe it's worth adding a comment here, something along the lines of we don't need to synchronize the module list as it's thread safe and no two threads look at the two lists at the same time, or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so the threads don't read from new_modules during loading, and new_modules is only used after all modules have been loaded in parallel in ModulesDidLoad, which does some more initialization synchronously.

I added a comment to explain this -- hopefully the comment is clear enough.

@zhyty
Copy link
Contributor Author

zhyty commented Mar 14, 2025

This looks like the equivalent of #110646 but for Linux and the overall approach looks good to me.

My only real concern is about the setting and it being off by default. I would prefer that this is on by default and the setting is there as an escape hatch in case there's an unexpected issue. I'd like to understand what's holding that back?

Building on that, if the setting is on by default, then instead of creating a new setting per plugin, I'd like to hoist it into target and have one setting that controls this behavior for all of them. I expect it to be rare that you'd want to turn this off for one plugin but not the other. Unless we are a 100% sure we're going to take away the setting (in which case the setting should be experimental[1]), it's easier to keep it around, for example to make it easier to debug or quickly toggle it and see if a concurrency issue disappears.

[1] See plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load. The experimental prefix allows you to set that option without it existing for when it either gets promoted to a real setting (without the experimental) or alternative it goes away.

Thanks for the review.

Yes, this PR was inspired by #110646 (I didn't have the PR number on hand, will link in the description).

Internally, we were planning to have it on by default as well, but we felt that maybe it would be easier to land in open source if we had it off by default. I'm in favor of having it on as well, having no known issues.

@JDevlieghere
Copy link
Member

Great, if that's the case let's turn it on by default, and go with the shared setting at the target level. We can migrate over the Darwin plugin in a follow-up PR.

@zhyty
Copy link
Contributor Author

zhyty commented Mar 19, 2025

@JDevlieghere in these 2 commits, I hoisted the flag out to target, fixed/added comments where appropriate, and switched the shared_mutex to RWMutex.

@zhyty zhyty requested a review from JDevlieghere March 21, 2025 17:12
Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@royitaqi royitaqi merged commit a8d2d16 into llvm:main Mar 31, 2025
9 of 10 checks passed
royitaqi pushed a commit that referenced this pull request Apr 7, 2025
#134437)

A requested follow-up from
#130912 by @JDevlieghere to
control Darwin parallel image loading with the same
`target.parallel-module-load` that controls the POSIX dyld parallel
image loading. Darwin parallel image loading was introduced by
#110646.

This small change:
* removes
`plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load`
and associated code.
* changes setting call site in
`DynamicLoaderDarwin::PreloadModulesFromImageInfos` to use the new
setting.

Tested by running `ninja check-lldb` and loading some targets.

Co-authored-by: Tom Yang <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Apr 7, 2025
…-module-load (#134437)

A requested follow-up from
llvm/llvm-project#130912 by @JDevlieghere to
control Darwin parallel image loading with the same
`target.parallel-module-load` that controls the POSIX dyld parallel
image loading. Darwin parallel image loading was introduced by
llvm/llvm-project#110646.

This small change:
* removes
`plugin.dynamic-loader.darwin.experimental.enable-parallel-image-load`
and associated code.
* changes setting call site in
`DynamicLoaderDarwin::PreloadModulesFromImageInfos` to use the new
setting.

Tested by running `ninja check-lldb` and loading some targets.

Co-authored-by: Tom Yang <[email protected]>
@JDevlieghere
Copy link
Member

Forgot to ask here earlier, but would you also be able to add a release note in llvm-project/llvm/docs/ReleaseNotes.md? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants