Skip to content

[lldb-dap] Change the launch sequence #138219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented May 1, 2025

This PR changes how we treat the launch sequence in lldb-dap.

  • Send the initialized event after we finish handling the initialize
    request, rather than after we finish attaching or launching.
  • Delay handling the launch and attach request until we have handled
    the configurationDone request. The latter is now largely a NO-OP and
    only exists to signal lldb-dap that it can handle the launch and
    attach requests.
  • Delay handling the initial threads requests until we have handled
    the launch or attach request.
  • Make all attaching and launching asynchronous, including when we have
    attach or launch commands. That removes the need to synchronize
    between the request and event thread.

Background: https://discourse.llvm.org/t/reliability-of-the-lldb-dap-tests/86125

Copy link

github-actions bot commented May 1, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

{
std::unique_lock<std::mutex> lock(dap.ignore_stop_mutex);
if (dap.ignore_next_stop) {
DAP_LOG(dap.log, "Ignoring process stop event");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there always precisely 1 stop event we should ignore? Or could multiple happen? For example, if we're running attachCommands or launchCommands could more than 1 stop event be triggered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good point, and using launch commands, that't entirely out of our control. You could write a sequence of commands that stop the process an arbitrary number of time. This realization is what made me go with the current approach of doing all the launching and attaching in synchronous mode, because we'll never be able to account for this.

@JDevlieghere JDevlieghere force-pushed the lldb-dap-launch-sequence branch from 730fdeb to e733348 Compare May 2, 2025 17:05
@JDevlieghere
Copy link
Member Author

@kusmour Can you please confirm whether or not there's a reason to run the launch or attach commands in asynchronous mode? It looks like the divergence was introduced when the feature was added (https://reviews.llvm.org/D154028) and I'm not sure it's necessary. My local testing (using the gdb-remote 3000 example from the README) seems to work as expected in synchronous mode and even seems like it would desirable?

disconnecting = true;
std::get_if<protocol::Request>(&*next)) {
if (req->command == "disconnect")
disconnecting = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will set disconnecting to true before we've actually responded to the request and this also changed the loop to break as soon as it sees this set to true (line 964). If there are multiple queued requests then we'll exit before responding to those requests.

Should we instead wait until the queue is empty instead of disconnecting is true on line 964?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I noticed the same thing when running the tests and all of them would fail because we didn't respond to the disconnecting request. I think what you're suggesting makes sense: if we stop reading new messages after we've seen disconnecting, we can use the queue being empty as a way to signal that we're done.

// be answered until we've gotten the confgigurationDone request. We
// can't answer the threads request until we've successfully launched
// or attached.
bool is_part_of_launch_sequence = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After request initialize + event initialized, should we queue everything until we see configurationDone? We're already making a special case for threads and I worry about other special cases. Maybe the general flow should be to put all messages into the launch queue except initialize, configurationDone and disconnect (in case we need to shutdown before launching is complete) until we've responded to configurationDone. Then we can process the launch queue and processed as normal at that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea. We can't quite move everything over to the other queue, because the following requests can be made between the initialize and configuration done request:

In response to the initialized event, the development tool sends the configuration information using these requests:

setBreakpoints one request for all breakpoints in a single source,
setFunctionBreakpoints if the debug adapter supports function breakpoints,
setExceptionBreakpoints if the debug adapter supports any exception options,
configurationDoneRequest to indicate the end of the configuration sequence.

But limiting the ones that can be made is definitely more robust than what we're doing now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to respond to those requests immediately, they're sent, but at least VSCode doesn't wait for a response to them before sending the configurationDone request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that would be a problem though because we'd miss hitting initial breakpoints, so ya, your right about those requests.

Comment on lines 959 to 974
m_queue_cv.wait(lock, [&] {
return disconnecting || !m_queue.empty() ||
(!m_launch_sequence_queue.empty() && configuration_done);
});

if (m_queue.empty())
if (disconnecting)
break;

Message next = m_queue.front();
m_queue.pop_front();
bool is_launch_seqeuence =
!m_launch_sequence_queue.empty() && configuration_done;

auto &active_queue =
is_launch_seqeuence ? m_launch_sequence_queue : m_queue;

assert(!active_queue.empty() &&
"shouldn't have gotten past the wait with an empty queue");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we kept the messages in the m_launch_sequence_queue until configurationDone and in configurationDone PostRun() we move the items in the m_launch_sequence_queue into the front of m_queue?

Then this loop would only need to check the m_queue.empty() and have a few less branching paths.

We could add a helper like:

void DAP::SetConfigurationDone() {
  std::lock_guard<std::mutex> guard(m_queue_mutex);
  std::copy(m_launch_sequence_queue.front(), m_launch_sequence_queue.end(), std::front_inserter(m_queue));
  configuration_done = true;
}

and call in ConfigurationDoneRequestHandler.

@kusmour
Copy link
Contributor

kusmour commented May 2, 2025

@kusmour Can you please confirm whether or not there's a reason to run the launch or attach commands in asynchronous mode? It looks like the divergence was introduced when the feature was added (https://reviews.llvm.org/D154028) and I'm not sure it's necessary. My local testing (using the gdb-remote 3000 example from the README) seems to work as expected in synchronous mode and even seems like it would desirable?

I just synced with Greg on some context. I think the nature of attachCommands will allow people do anything. What if they have a waitFor command that will not get timed out by GDBRemote timeout? Then lldb-dap will never return anything to the client. The dap.WaitForProcessToStop(timeout_seconds); will guarantee we at least return error if things aren't working.

In fact maybe we should consider making the normal launch/attach in async and have the dap.WaitForProcessToStop(timeout_seconds); at the end to ensure the process is in expected state before responding to the launch/attach req?

@JDevlieghere JDevlieghere marked this pull request as ready for review May 2, 2025 21:27
@llvmbot
Copy link
Member

llvmbot commented May 2, 2025

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

This PR changes how we treat the launch sequence in lldb-dap.

  • Send the initialized event after we finish handling the initialize
    request, rather than after we finish attaching or launching.
  • Delay handling the launch and attach request until we have handled
    the configurationDone request. The latter is now largely a NO-OP and
    only exists to signal lldb-dap that it can handle the launch and
    attach requests.
  • Delay handling the initial threads requests until we have handled
    the launch or attach request.
  • Make all attaching and launching asynchronous, including when we have
    attach or launch commands. That removes the need to synchronize
    between the request and event thread.

Background: https://discourse.llvm.org/t/reliability-of-the-lldb-dap-tests/86125


Patch is 38.47 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138219.diff

29 Files Affected:

  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py (+23-21)
  • (modified) lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py (+4)
  • (modified) lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py (+2-1)
  • (modified) lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py (+4-4)
  • (modified) lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py (+1-2)
  • (modified) lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py (+1-2)
  • (modified) lldb/test/API/tools/lldb-dap/commands/TestDAP_commands.py (+1-2)
  • (modified) lldb/test/API/tools/lldb-dap/completions/TestDAP_completions.py (+3-3)
  • (modified) lldb/test/API/tools/lldb-dap/console/TestDAP_console.py (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py (+1-2)
  • (modified) lldb/test/API/tools/lldb-dap/disconnect/TestDAP_disconnect.py (+5-1)
  • (modified) lldb/test/API/tools/lldb-dap/evaluate/TestDAP_evaluate.py (+1-2)
  • (modified) lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py (+2-2)
  • (modified) lldb/test/API/tools/lldb-dap/memory/TestDAP_memory.py (+1-2)
  • (modified) lldb/test/API/tools/lldb-dap/progress/TestDAP_Progress.py (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/restart/TestDAP_restart.py (-1)
  • (modified) lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py (-1)
  • (modified) lldb/test/API/tools/lldb-dap/stop-hooks/TestDAP_stop_hooks.py (+1-1)
  • (modified) lldb/test/API/tools/lldb-dap/variables/TestDAP_variables.py (+1-2)
  • (modified) lldb/tools/lldb-dap/DAP.cpp (+31-8)
  • (modified) lldb/tools/lldb-dap/DAP.h (+6-2)
  • (modified) lldb/tools/lldb-dap/EventHelper.cpp (+1-1)
  • (modified) lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp (+56-44)
  • (modified) lldb/tools/lldb-dap/Handler/ConfigurationDoneRequestHandler.cpp (+2-12)
  • (modified) lldb/tools/lldb-dap/Handler/InitializeRequestHandler.cpp (+17-27)
  • (modified) lldb/tools/lldb-dap/Handler/LaunchRequestHandler.cpp (-2)
  • (modified) lldb/tools/lldb-dap/Handler/RequestHandler.cpp (+45-27)
  • (modified) lldb/tools/lldb-dap/Handler/RequestHandler.h (+1)
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
index 6d9ab770684f1..7d4f5a2b15680 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/dap_server.py
@@ -591,6 +591,7 @@ def request_attach(
         attachCommands=None,
         terminateCommands=None,
         coreFile=None,
+        stopOnAttach=True,
         postRunCommands=None,
         sourceMap=None,
         gdbRemotePort=None,
@@ -620,6 +621,8 @@ def request_attach(
             args_dict["attachCommands"] = attachCommands
         if coreFile:
             args_dict["coreFile"] = coreFile
+        if stopOnAttach:
+            args_dict["stopOnEntry"] = stopOnAttach
         if postRunCommands:
             args_dict["postRunCommands"] = postRunCommands
         if sourceMap:
@@ -666,10 +669,6 @@ def request_configurationDone(self):
         response = self.send_recv(command_dict)
         if response:
             self.configuration_done_sent = True
-            # Client requests the baseline of currently existing threads after
-            # a successful launch or attach.
-            # Kick off the threads request that follows
-            self.request_threads()
         return response
 
     def _process_stopped(self):
@@ -1325,6 +1324,26 @@ def attach_options_specified(options):
 
 def run_vscode(dbg, args, options):
     dbg.request_initialize(options.sourceInitFile)
+
+    if options.sourceBreakpoints:
+        source_to_lines = {}
+        for file_line in options.sourceBreakpoints:
+            (path, line) = file_line.split(":")
+            if len(path) == 0 or len(line) == 0:
+                print('error: invalid source with line "%s"' % (file_line))
+
+            else:
+                if path in source_to_lines:
+                    source_to_lines[path].append(int(line))
+                else:
+                    source_to_lines[path] = [int(line)]
+        for source in source_to_lines:
+            dbg.request_setBreakpoints(source, source_to_lines[source])
+    if options.funcBreakpoints:
+        dbg.request_setFunctionBreakpoints(options.funcBreakpoints)
+
+    dbg.request_configurationDone()
+
     if attach_options_specified(options):
         response = dbg.request_attach(
             program=options.program,
@@ -1353,23 +1372,6 @@ def run_vscode(dbg, args, options):
         )
 
     if response["success"]:
-        if options.sourceBreakpoints:
-            source_to_lines = {}
-            for file_line in options.sourceBreakpoints:
-                (path, line) = file_line.split(":")
-                if len(path) == 0 or len(line) == 0:
-                    print('error: invalid source with line "%s"' % (file_line))
-
-                else:
-                    if path in source_to_lines:
-                        source_to_lines[path].append(int(line))
-                    else:
-                        source_to_lines[path] = [int(line)]
-            for source in source_to_lines:
-                dbg.request_setBreakpoints(source, source_to_lines[source])
-        if options.funcBreakpoints:
-            dbg.request_setFunctionBreakpoints(options.funcBreakpoints)
-        dbg.request_configurationDone()
         dbg.wait_for_stopped()
     else:
         if "message" in response:
diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
index ee5272850b9a8..5e48f8f1e9bde 100644
--- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
+++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py
@@ -333,6 +333,7 @@ def attach(
         exitCommands=None,
         attachCommands=None,
         coreFile=None,
+        stopOnAttach=True,
         disconnectAutomatically=True,
         terminateCommands=None,
         postRunCommands=None,
@@ -357,6 +358,7 @@ def cleanup():
         self.addTearDownHook(cleanup)
         # Initialize and launch the program
         self.dap_server.request_initialize(sourceInitFile)
+        self.dap_server.request_configurationDone()
         response = self.dap_server.request_attach(
             program=program,
             pid=pid,
@@ -369,6 +371,7 @@ def cleanup():
             attachCommands=attachCommands,
             terminateCommands=terminateCommands,
             coreFile=coreFile,
+            stopOnAttach=stopOnAttach,
             postRunCommands=postRunCommands,
             sourceMap=sourceMap,
             gdbRemotePort=gdbRemotePort,
@@ -427,6 +430,7 @@ def cleanup():
 
         # Initialize and launch the program
         self.dap_server.request_initialize(sourceInitFile)
+        self.dap_server.request_configurationDone()
         response = self.dap_server.request_launch(
             program,
             args=args,
diff --git a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
index 6f70316821c8c..01fba0e5694d4 100644
--- a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
+++ b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py
@@ -25,9 +25,10 @@ def spawn_and_wait(program, delay):
     process.wait()
 
 
-@skipIf
 class TestDAP_attach(lldbdap_testcase.DAPTestCaseBase):
     def set_and_hit_breakpoint(self, continueToExit=True):
+        self.dap_server.wait_for_stopped()
+
         source = "main.c"
         breakpoint1_line = line_number(source, "// breakpoint 1")
         lines = [breakpoint1_line]
diff --git a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py
index 51f62b79f3f4f..4f2298a9b73b6 100644
--- a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py
+++ b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py
@@ -19,17 +19,17 @@
 import socket
 
 
-@skip
 class TestDAP_attachByPortNum(lldbdap_testcase.DAPTestCaseBase):
     default_timeout = 20
 
     def set_and_hit_breakpoint(self, continueToExit=True):
+        self.dap_server.wait_for_stopped()
+
         source = "main.c"
-        main_source_path = os.path.join(os.getcwd(), source)
-        breakpoint1_line = line_number(main_source_path, "// breakpoint 1")
+        breakpoint1_line = line_number(source, "// breakpoint 1")
         lines = [breakpoint1_line]
         # Set breakpoint in the thread function so we can step the threads
-        breakpoint_ids = self.set_source_breakpoints(main_source_path, lines)
+        breakpoint_ids = self.set_source_breakpoints(source, lines)
         self.assertEqual(
             len(breakpoint_ids), len(lines), "expect correct number of breakpoints"
         )
diff --git a/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py b/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py
index 4a99cacc761a3..1058157e2c668 100644
--- a/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py
+++ b/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_breakpointLocations.py
@@ -11,8 +11,7 @@
 import lldbdap_testcase
 import os
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_breakpointLocations(lldbdap_testcase.DAPTestCaseBase):
     def setUp(self):
         lldbdap_testcase.DAPTestCaseBase.setUp(self)
diff --git a/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py b/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py
index 6c6681804f250..26df2573555df 100644
--- a/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py
+++ b/lldb/test/API/tools/lldb-dap/breakpoint/TestDAP_setBreakpoints.py
@@ -11,8 +11,7 @@
 import lldbdap_testcase
 import os
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_setBreakpoints(lldbdap_testcase.DAPTestCaseBase):
     def setUp(self):
         lldbdap_testcase.DAPTestCaseBase.setUp(self)
diff --git a/lldb/test/API/tools/lldb-dap/commands/TestDAP_commands.py b/lldb/test/API/tools/lldb-dap/commands/TestDAP_commands.py
index 8398eeab7bba2..25ecbb5cf106b 100644
--- a/lldb/test/API/tools/lldb-dap/commands/TestDAP_commands.py
+++ b/lldb/test/API/tools/lldb-dap/commands/TestDAP_commands.py
@@ -5,8 +5,7 @@
 from lldbsuite.test import lldbtest, lldbutil
 from lldbsuite.test.decorators import *
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_commands(lldbdap_testcase.DAPTestCaseBase):
     def test_command_directive_quiet_on_success(self):
         program = self.getBuildArtifact("a.out")
diff --git a/lldb/test/API/tools/lldb-dap/completions/TestDAP_completions.py b/lldb/test/API/tools/lldb-dap/completions/TestDAP_completions.py
index 210e591bff426..455ac84168baf 100644
--- a/lldb/test/API/tools/lldb-dap/completions/TestDAP_completions.py
+++ b/lldb/test/API/tools/lldb-dap/completions/TestDAP_completions.py
@@ -44,9 +44,9 @@ def verify_completions(self, actual_list, expected_list, not_expected_list=[]):
             self.assertNotIn(not_expected_item, actual_list)
 
 
-    def setup_debugee(self):
+    def setup_debugee(self, stopOnEntry=False):
         program = self.getBuildArtifact("a.out")
-        self.build_and_launch(program)
+        self.build_and_launch(program, stopOnEntry=stopOnEntry)
 
         source = "main.cpp"
         breakpoint1_line = line_number(source, "// breakpoint 1")
@@ -235,7 +235,7 @@ def test_auto_completions(self):
         """
         Tests completion requests in "repl-mode=auto"
         """
-        self.setup_debugee()
+        self.setup_debugee(stopOnEntry=True)
 
         res = self.dap_server.request_evaluate(
             "`lldb-dap repl-mode auto", context="repl"
diff --git a/lldb/test/API/tools/lldb-dap/console/TestDAP_console.py b/lldb/test/API/tools/lldb-dap/console/TestDAP_console.py
index b07c4f871d73b..65a1bc04c7cd7 100644
--- a/lldb/test/API/tools/lldb-dap/console/TestDAP_console.py
+++ b/lldb/test/API/tools/lldb-dap/console/TestDAP_console.py
@@ -167,7 +167,7 @@ def test_exit_status_message_ok(self):
 
     def test_diagnositcs(self):
         program = self.getBuildArtifact("a.out")
-        self.build_and_launch(program)
+        self.build_and_launch(program, stopOnEntry=True)
 
         core = self.getBuildArtifact("minidump.core")
         self.yaml2obj("minidump.yaml", core)
diff --git a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
index ebecb349ac177..9e8ef5b289f2e 100644
--- a/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
+++ b/lldb/test/API/tools/lldb-dap/disassemble/TestDAP_disassemble.py
@@ -10,8 +10,7 @@
 import lldbdap_testcase
 import os
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_disassemble(lldbdap_testcase.DAPTestCaseBase):
     @skipIfWindows
     def test_disassemble(self):
diff --git a/lldb/test/API/tools/lldb-dap/disconnect/TestDAP_disconnect.py b/lldb/test/API/tools/lldb-dap/disconnect/TestDAP_disconnect.py
index 0cb792d662a80..09e3f62f0eead 100644
--- a/lldb/test/API/tools/lldb-dap/disconnect/TestDAP_disconnect.py
+++ b/lldb/test/API/tools/lldb-dap/disconnect/TestDAP_disconnect.py
@@ -31,7 +31,7 @@ def test_launch(self):
         created.
         """
         program = self.getBuildArtifact("a.out")
-        self.build_and_launch(program, disconnectAutomatically=False)
+        self.build_and_launch(program, stopOnEntry=True, disconnectAutomatically=False)
 
         # We set a breakpoint right before the side effect file is created
         self.set_source_breakpoints(
@@ -39,7 +39,11 @@ def test_launch(self):
         )
         self.continue_to_next_stop()
 
+        # verify we haven't produced the side effect file yet
+        self.assertFalse(os.path.exists(program + ".side_effect"))
+
         self.dap_server.request_disconnect()
+
         # verify we didn't produce the side effect file
         time.sleep(1)
         self.assertFalse(os.path.exists(program + ".side_effect"))
diff --git a/lldb/test/API/tools/lldb-dap/evaluate/TestDAP_evaluate.py b/lldb/test/API/tools/lldb-dap/evaluate/TestDAP_evaluate.py
index d97fda730c46a..e2f843bd337a6 100644
--- a/lldb/test/API/tools/lldb-dap/evaluate/TestDAP_evaluate.py
+++ b/lldb/test/API/tools/lldb-dap/evaluate/TestDAP_evaluate.py
@@ -10,8 +10,7 @@
 from lldbsuite.test.decorators import *
 from lldbsuite.test.lldbtest import *
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_evaluate(lldbdap_testcase.DAPTestCaseBase):
     def assertEvaluate(self, expression, regex):
         self.assertRegex(
diff --git a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
index 931456299e03e..604a41678500c 100644
--- a/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
+++ b/lldb/test/API/tools/lldb-dap/launch/TestDAP_launch.py
@@ -88,8 +88,8 @@ def test_stopOnEntry(self):
         """
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(program, stopOnEntry=True)
-        self.set_function_breakpoints(["main"])
-        stopped_events = self.continue_to_next_stop()
+
+        stopped_events = self.dap_server.wait_for_stopped()
         for stopped_event in stopped_events:
             if "body" in stopped_event:
                 body = stopped_event["body"]
diff --git a/lldb/test/API/tools/lldb-dap/memory/TestDAP_memory.py b/lldb/test/API/tools/lldb-dap/memory/TestDAP_memory.py
index c71ba871b8a22..ea43fccf016a7 100644
--- a/lldb/test/API/tools/lldb-dap/memory/TestDAP_memory.py
+++ b/lldb/test/API/tools/lldb-dap/memory/TestDAP_memory.py
@@ -10,8 +10,7 @@
 import lldbdap_testcase
 import os
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_memory(lldbdap_testcase.DAPTestCaseBase):
     def test_memory_refs_variables(self):
         """
diff --git a/lldb/test/API/tools/lldb-dap/progress/TestDAP_Progress.py b/lldb/test/API/tools/lldb-dap/progress/TestDAP_Progress.py
index fee63655de0da..0f94b50c31fba 100755
--- a/lldb/test/API/tools/lldb-dap/progress/TestDAP_Progress.py
+++ b/lldb/test/API/tools/lldb-dap/progress/TestDAP_Progress.py
@@ -50,7 +50,7 @@ def verify_progress_events(
     @skipIfWindows
     def test(self):
         program = self.getBuildArtifact("a.out")
-        self.build_and_launch(program)
+        self.build_and_launch(program, stopOnEntry=True)
         progress_emitter = os.path.join(os.getcwd(), "Progress_emitter.py")
         self.dap_server.request_evaluate(
             f"`command script import {progress_emitter}", context="repl"
diff --git a/lldb/test/API/tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py b/lldb/test/API/tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py
index c6f59949d668e..81edcdf4bd0f9 100644
--- a/lldb/test/API/tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py
+++ b/lldb/test/API/tools/lldb-dap/repl-mode/TestDAP_repl_mode_detection.py
@@ -20,7 +20,7 @@ def assertEvaluate(self, expression, regex):
 
     def test_completions(self):
         program = self.getBuildArtifact("a.out")
-        self.build_and_launch(program)
+        self.build_and_launch(program, stopOnEntry=True)
 
         source = "main.cpp"
         breakpoint1_line = line_number(source, "// breakpoint 1")
diff --git a/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart.py b/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart.py
index 36fa0bd40183f..5f95c7bfb1556 100644
--- a/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart.py
+++ b/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart.py
@@ -22,7 +22,6 @@ def test_basic_functionality(self):
         [bp_A, bp_B] = self.set_source_breakpoints("main.c", [line_A, line_B])
 
         # Verify we hit A, then B.
-        self.dap_server.request_configurationDone()
         self.verify_breakpoint_hit([bp_A])
         self.dap_server.request_continue()
         self.verify_breakpoint_hit([bp_B])
diff --git a/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py b/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py
index a94c9860c1508..eed769a5a0cc6 100644
--- a/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py
+++ b/lldb/test/API/tools/lldb-dap/restart/TestDAP_restart_runInTerminal.py
@@ -74,7 +74,6 @@ def test_stopOnEntry(self):
         program = self.getBuildArtifact("a.out")
         self.build_and_launch(program, runInTerminal=True, stopOnEntry=True)
         [bp_main] = self.set_function_breakpoints(["main"])
-        self.dap_server.request_configurationDone()
 
         # When using stopOnEntry, configurationDone doesn't result in a running
         # process, we should immediately get a stopped event instead.
diff --git a/lldb/test/API/tools/lldb-dap/stop-hooks/TestDAP_stop_hooks.py b/lldb/test/API/tools/lldb-dap/stop-hooks/TestDAP_stop_hooks.py
index 70c11a63a79f7..7e28a5af4331c 100644
--- a/lldb/test/API/tools/lldb-dap/stop-hooks/TestDAP_stop_hooks.py
+++ b/lldb/test/API/tools/lldb-dap/stop-hooks/TestDAP_stop_hooks.py
@@ -19,7 +19,7 @@ def test_stop_hooks_before_run(self):
         self.build_and_launch(program, stopOnEntry=True, preRunCommands=preRunCommands)
 
         # The first stop is on entry.
-        self.continue_to_next_stop()
+        self.dap_server.wait_for_stopped()
 
         breakpoint_ids = self.set_function_breakpoints(["main"])
         # This request hangs if the race happens, because, in that case, the
diff --git a/lldb/test/API/tools/lldb-dap/variables/TestDAP_variables.py b/lldb/test/API/tools/lldb-dap/variables/TestDAP_variables.py
index 901c260d7d413..286bf3390a440 100644
--- a/lldb/test/API/tools/lldb-dap/variables/TestDAP_variables.py
+++ b/lldb/test/API/tools/lldb-dap/variables/TestDAP_variables.py
@@ -17,8 +17,7 @@ def make_buffer_verify_dict(start_idx, count, offset=0):
         verify_dict["[%i]" % (i)] = {"type": "int", "value": str(i + offset)}
     return verify_dict
 
-# DAP tests are flakey, see https://github.com/llvm/llvm-project/issues/137660.
-@skip
+
 class TestDAP_variables(lldbdap_testcase.DAPTestCaseBase):
     def verify_values(self, verify_dict, actual, varref_dict=None, expression=None):
         if "equals" in verify_dict:
diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp
index 4cb0d8e49004c..0ab16f16d30bf 100644
--- a/lldb/tools/lldb-dap/DAP.cpp
+++ b/lldb/tools/lldb-dap/DAP.cpp
@@ -84,8 +84,8 @@ DAP::DAP(Log *log, const ReplMode default_repl_mode,
     : log(log), transport(transport), broadcaster("lldb-dap"),
       exception_breakpoints(), focus_tid(LLDB_INVALID_THREAD_ID),
       stop_at_entry(false), is_attach(false),
-      restarting_process_id(LLDB_INVALID_PROCESS_ID),
-      configuration_done_sent(false), waiting_for_run_in_terminal(false),
+      restarting_process_id(LLDB_INVALID_PROCESS_ID), configuration_done(false),
+      waiting_for_run_in_terminal(false),
       progress_event_reporter(
           [&](const ProgressEvent &event) { SendJSON(event.ToJSON()); }),
       reverse_request_seq(0), repl_mode(default_repl_mode) {
@@ -893,10 +893,19 @@ llvm::Error DAP::Loop() {
             return errWrapper;
           }
 
+          // The launch sequence is special and we need to carefully handle
+          // packets in the right order. Until we've handled configurationDone,
+          bool add_to_pending_queue = false;
+
           if (const protocol::Request *req =
-                  std::get_if<protocol::Request>(&*next);
-              req && req->command == "disconnect") {
-            disconnecting = true;
+                  std::get_if<protocol::Request>(&*next)) {
+            llvm::StringRef command = req->command;
+            if (command == "disconnect")
+              disconnecting = true;
+            if (!configuration_done)
+              add_to_pending_queue =
+                  command != "initialize" && command != "configurationDone" &&
+                  command != "disconnect" && !command.ends_with("Breakpoints");
           }
 
           const std::optional<CancelArguments> cancel_args =
@@ -924,7 +933,8 @@ llvm::Error DAP::Loop() {
 
           {
             std::lock_guard<std::mutex> guard...
[truncated]

@JDevlieghere
Copy link
Member Author

@kusmour Can you please confirm whether or not there's a reason to run the launch or attach commands in asynchronous mode? It looks like the divergence was introduced when the feature was added (https://reviews.llvm.org/D154028) and I'm not sure it's necessary. My local testing (using the gdb-remote 3000 example from the README) seems to work as expected in synchronous mode and even seems like it would desirable?

I just synced with Greg on some context. I think the nature of attachCommands will allow people do anything. What if they have a waitFor command that will not get timed out by GDBRemote timeout? Then lldb-dap will never return anything to the client. The dap.WaitForProcessToStop(timeout_seconds); will guarantee we at least return error if things aren't working.

Right, that's exactly the problem, and it goes both ways. Similarly to the situation where there's no stop, we have the same problem if there's more than one stop.

In fact maybe we should consider making the normal launch/attach in async and have the dap.WaitForProcessToStop(timeout_seconds); at the end to ensure the process is in expected state before responding to the launch/attach req?

That's what I wanted to do originally, but I think that makes things strictly more complicated because it requires lldb-dap to deal with all the stops. Imagine you have a launch command that stops and resumes the inferior more than once. If we do that asynchronously, we'll get multiple stop events.

  • If stop-on-entry is false, we have to issue the continue so the process is running when the request completes. How do you know after which stop event to issue the continue? Because the process events are delivered asynchronously, you don't know when you're done handling all the events belonging to the launch or attach.
  • If stop-on-entry is false, we have to make sure the process is in a stopped state when the request completes. How do you know after which running event to issue the continue?

At this point I'm convinced that there's no sound way of supporting all these scenarios from lldb-dap. The benefit of using synchronous mode is that we can let lldb itself deal with most of the stops, at the cost of potentially not doing the right thing when you have launch commands. I hope I'm wrong or missing something, but I would really like to solve this better, but I'm out of ideas.

@@ -591,6 +591,7 @@ def request_attach(
attachCommands=None,
terminateCommands=None,
coreFile=None,
stopOnAttach=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we default to False for attach requests?

@@ -333,6 +333,7 @@ def attach(
exitCommands=None,
attachCommands=None,
coreFile=None,
stopOnAttach=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we default this to False?

Or should we have the DAPTestCaseBase case attach and launch helpers take an initialSourceBreakpoints, initialFunctionBreakpoints, initialExceptionBreakpoints parameter and we help the tests by sending these as part of the setup flow, kind of how we have disconnectAutomatically as a flag to handle shutdown flows automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the tests are currently written, they expect to be stopped after the launch (presumably because these tests were never sending the configurationDone request). I went with this default to keep the changes minimal, but I like the idea of the initial breakpoints as a way to improve this in a subsequent PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we can follow up with some other improvements to tests.

@@ -591,6 +591,7 @@ def request_attach(
attachCommands=None,
terminateCommands=None,
coreFile=None,
stopOnAttach=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason we want to make this true by default?

  1. Our current tests don't have stopOnEntry by default
  2. DAP doesn't have definition about this nor does lldb-dap documentation. So most real life attach case will not have this turned on.
  3. I don't think we have lldb-dap test case covering multithreading scenario. But on linux, ptrace will put SIGSTOP on every thread and currently in SendThreadStoppedEvent(dap) we will send a stopped event for any threads that has the a stop reason, meaning we will actually send N stopped event (N being # of threads) upon attach request with stopOnAttach set to true.

Though it's only turned on for tests, I think we should make it false until it become the expected default behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my reply to John: #138219 (comment)

@@ -427,6 +430,7 @@ def cleanup():

# Initialize and launch the program
self.dap_server.request_initialize(sourceInitFile)
self.dap_server.request_configurationDone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -235,7 +235,7 @@ def test_auto_completions(self):
"""
Tests completion requests in "repl-mode=auto"
"""
self.setup_debugee()
self.setup_debugee(stopOnEntry=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we are changing the test behavior. I believe currently they will be false.

@@ -88,8 +88,8 @@ def test_stopOnEntry(self):
"""
program = self.getBuildArtifact("a.out")
self.build_and_launch(program, stopOnEntry=True)
self.set_function_breakpoints(["main"])
stopped_events = self.continue_to_next_stop()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probs should update the line 98 & 99 below

                    self.assertEqual(
                        reason, "singal", 'verify stop reason is SIGSTOP'
                    )

Comment on lines +189 to +196
// Clients can request a baseline of currently existing threads after
// we acknowledge the configurationDone request.
// Client requests the baseline of currently existing threads after
// a successful or attach by sending a 'threads' request
// right after receiving the configurationDone response.
// Obtain the list of threads before we resume the process
dap.initial_thread_list =
GetThreads(dap.target.GetProcess(), dap.thread_format);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we delay the handling of launch/attach request after responding to configurationDone, what happens to threads request? Is it possible that lldb-dap gets a threads request before saving the initial_thread_list here?

VSCode sends out threads request right after the configurationDone response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The threads request gets delayed as well. As long as it comes in after the launch/attach request, it will get handled after. Here's the log between VS Code and the implementation in this PR:

--> {"command":"initialize","arguments":{"clientID":"vscode","clientName":"Visual Studio Code","adapterID":"lldb-dap","pathFormat":"path","linesStartAt1":true,"columnsStartAt1":true,"supportsVariableType":true,"supportsVariablePaging":true,"supportsRunInTerminalRequest":true,"locale":"en","supportsProgressReporting":true,"supportsInvalidatedEvent":true,"supportsMemoryReferences":true,"supportsArgsCanBeInterpretedByShell":true,"supportsMemoryEvent":true,"supportsStartDebuggingRequest":true,"supportsANSIStyling":true},"type":"request","seq":1}
<-- {"body":{"$__lldb_version":"lldb version 21.0.0git ([email protected]:JDevlieghere/llvm-project.git revision e7333489e50b1efe16203e8fd7b6dc7f2dcd4439)\n  clang revision e7333489e50b1efe16203e8fd7b6dc7f2dcd4439\n  llvm revision e7333489e50b1efe16203e8fd7b6dc7f2dcd4439","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"default":false,"filter":"cpp_catch","label":"C++ Catch"},{"default":false,"filter":"cpp_throw","label":"C++ Throw"},{"default":false,"filter":"objc_catch","label":"Objective-C Catch"},{"default":false,"filter":"objc_throw","label":"Objective-C Throw"}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionInfoRequest":true,"supportsExceptionOptions":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsRestartRequest":true,"supportsSetVariable":true,"supportsStepInTargetsRequest":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
<-- {"event":"initialized","seq":0,"type":"event"}
--> {"command":"launch","arguments":{"type":"lldb-dap","request":"launch","name":"Launch LLDB","program":"/Users/jonas/llvm/build-ra/bin/lldb","args":[],"env":[],"cwd":"/Users/jonas/llvm/llvm-project","__configurationTarget":6,"__sessionId":"8dc997cc-ed4f-4ce1-a7e8-f2d59ea84796"},"type":"request","seq":2}
--> {"command":"setFunctionBreakpoints","arguments":{"breakpoints":[]},"type":"request","seq":3}
<-- {"body":{"breakpoints":[]},"command":"setFunctionBreakpoints","request_seq":3,"seq":0,"success":true,"type":"response"}
--> {"command":"setInstructionBreakpoints","arguments":{"breakpoints":[]},"type":"request","seq":4}
--> {"command":"setExceptionBreakpoints","arguments":{"filters":[]},"type":"request","seq":5}
<-- {"body":{"breakpoints":[]},"command":"setInstructionBreakpoints","request_seq":4,"seq":0,"success":true,"type":"response"}
--> {"command":"setDataBreakpoints","arguments":{"breakpoints":[]},"type":"request","seq":6}
<-- {"command":"setExceptionBreakpoints","request_seq":5,"seq":0,"success":true,"type":"response"}
<-- {"body":{"breakpoints":[]},"command":"setDataBreakpoints","request_seq":6,"seq":0,"success":true,"type":"response"}
--> {"command":"configurationDone","type":"request","seq":7}
<-- {"command":"configurationDone","request_seq":7,"seq":0,"success":true,"type":"response"}
--> {"command":"threads","type":"request","seq":8}
<-- {"body":{"module":{"addressRange":"4294967296","debugInfoSize":"","id":"CFAF6A92-9494-3549-9408-121B5F8E5A32","name":"lldb","path":"/Users/jonas/llvm/build-ra/bin/lldb","symbolFilePath":"/Users/jonas/llvm/build-ra/bin/lldb","symbolStatus":"Symbols loaded."},"reason":"removed"},"event":"module","seq":0,"type":"event"}
<-- {"command":"launch","request_seq":2,"seq":0,"success":true,"type":"response"}
<-- {"body":{"isLocalProcess":true,"name":"/Users/jonas/llvm/build-ra/bin/lldb","startMethod":"launch","systemProcessId":24119},"event":"process","seq":0,"type":"event"}
<-- {"body":{"threads":[{"id":3321319,"name":"Thread 1"}]},"command":"threads","request_seq":8,"seq":0,"success":true,"type":"response"}

Copy link
Contributor

@ashgti ashgti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yesterday I enabled all the lldb-dap tests that support macOS and I noticed #138197 where some tests attach failed due to attaching to a parallel test process. I was consistently hitting it on my MBP but I think due to timing not on my Mac Studio (different number of cores, I think).

This PR changes how we treat the launch sequence in lldb-dap.

 - Send the initialized event after we finish handling the initialize
   request, rather than after we finish attaching or launching.
 - Delay handling the launch and attach request until we have handled
   the configurationDone request. The latter is now largely a NO-OP and
   only exists to signal lldb-dap that it can handle the launch and
   attach requests.
 - Delay handling the initial threads requests until we have handled
   the launch or attach request.
 - Make all attaching and launching asynchronous, including when we have
   attach or launch commands. That removes the need to synchronize
   between the request and event thread.

Background: https://discourse.llvm.org/t/reliability-of-the-lldb-dap-tests/86125
@JDevlieghere JDevlieghere force-pushed the lldb-dap-launch-sequence branch from 17e812f to afd0475 Compare May 5, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants