Assorted fixes for the nested / docker runtimes. #8899

tofarr · 2025-06-04T19:02:29Z

This change is worth documenting at https://docs.all-hands.dev/
Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.
This PR improves the stability and reliability of nested and Docker runtimes in OpenHands, particularly when handling container lifecycle and conversation management. It fixes issues with container reuse, file paths, and error handling that could cause problems when starting, stopping, or restarting conversations.

Summarize what the PR does, explaining any non-trivial design decisions.

Improved container management:
- Instead of automatically removing and recreating containers when they already exist, the system now checks if a container exists and starts it if it is in an exited state
- Removed unnecessary container recreation logic that could cause instability
Fixed file path for nested conversations:
- Updated the volume mount path from /root/openhands/file_store/ to /root/.openhands/file_store/ to ensure proper file storage and access
Improved error handling:
- Changed remove() to discard() for the _starting_conversation_ids set to prevent KeyError exceptions
- Simplified error logging in the Docker runtime
Added configuration for sandbox close delay:
- Added a check to skip cleanup of disconnected conversations if close_delay is 0 or None
- Set SANDBOX_CLOSE_DELAY to "0" for nested containers to prevent premature cleanup
Code simplification:
- Simplified user ID retrieval by using the direct get_user_id() function

Link of any specific issues this addresses:

If the container exists, then leave it be. If it is broken, we can /delete it.

…ted conversations

tofarr · 2025-06-04T19:11:03Z

openhands/runtime/impl/docker/docker_runtime.py

-                    f'Error: Instance {self.container_name} FAILED to start container!\n',
-                )
-                self.log('error', str(e))
-                raise e


I removed this because it no longer makes sense. At some point in the past, stop_all_containers would delete matching docker containers rather than just stop them. As it stands, this results in an infinite loop where the system stops an already stopped container and then tries to recreate it. (This was not being caught only because maybe_start_agent_loop checks that the container exists before this is called)

tofarr added 4 commits June 4, 2025 12:56

Removed 409 check when initializing container.

96a1b82

If the container exists, then leave it be. If it is broken, we can /delete it.

If the close_delay is 0 or None, then don't try to clean up disconnec…

f57ae98

…ted conversations

Use the helper function for a shorter syntax. No functional change

7312669

More resilient start / stop / restart conversations

14eae67

tofarr marked this pull request as ready for review June 4, 2025 19:06

tofarr commented Jun 4, 2025

View reviewed changes

rbren approved these changes Jun 4, 2025

View reviewed changes

tofarr merged commit c6c2aaf into main Jun 4, 2025
21 checks passed

tofarr deleted the fix-nested-and-docker branch June 4, 2025 19:56

malhotra5 mentioned this pull request Jun 5, 2025

[Fix]: add missing await #8936

Merged

2 tasks

luk2038649 mentioned this pull request Jun 9, 2025

chore(kubernetes): rm chart ziffmedia/OpenHands#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Assorted fixes for the nested / docker runtimes. #8899

Assorted fixes for the nested / docker runtimes. #8899

Uh oh!

tofarr commented Jun 4, 2025 •

edited

Loading

Uh oh!

tofarr Jun 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Assorted fixes for the nested / docker runtimes. #8899

Assorted fixes for the nested / docker runtimes. #8899

Uh oh!

Conversation

tofarr commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tofarr Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tofarr commented Jun 4, 2025 •

edited

Loading

tofarr Jun 4, 2025 •

edited

Loading