Skip to content

Assorted fixes for the nested / docker runtimes. #8899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 4, 2025
Merged

Conversation

tofarr
Copy link
Collaborator

@tofarr tofarr commented Jun 4, 2025

  • This change is worth documenting at https://docs.all-hands.dev/
  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.
This PR improves the stability and reliability of nested and Docker runtimes in OpenHands, particularly when handling container lifecycle and conversation management. It fixes issues with container reuse, file paths, and error handling that could cause problems when starting, stopping, or restarting conversations.


Summarize what the PR does, explaining any non-trivial design decisions.

  1. Improved container management:

    • Instead of automatically removing and recreating containers when they already exist, the system now checks if a container exists and starts it if it is in an exited state
    • Removed unnecessary container recreation logic that could cause instability
  2. Fixed file path for nested conversations:

    • Updated the volume mount path from /root/openhands/file_store/ to /root/.openhands/file_store/ to ensure proper file storage and access
  3. Improved error handling:

    • Changed remove() to discard() for the _starting_conversation_ids set to prevent KeyError exceptions
    • Simplified error logging in the Docker runtime
  4. Added configuration for sandbox close delay:

    • Added a check to skip cleanup of disconnected conversations if close_delay is 0 or None
    • Set SANDBOX_CLOSE_DELAY to "0" for nested containers to prevent premature cleanup
  5. Code simplification:

    • Simplified user ID retrieval by using the direct get_user_id() function

Link of any specific issues this addresses:

@tofarr tofarr marked this pull request as ready for review June 4, 2025 19:06
f'Error: Instance {self.container_name} FAILED to start container!\n',
)
self.log('error', str(e))
raise e
Copy link
Collaborator Author

@tofarr tofarr Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it no longer makes sense. At some point in the past, stop_all_containers would delete matching docker containers rather than just stop them. As it stands, this results in an infinite loop where the system stops an already stopped container and then tries to recreate it. (This was not being caught only because maybe_start_agent_loop checks that the container exists before this is called)

@tofarr tofarr merged commit c6c2aaf into main Jun 4, 2025
21 checks passed
@tofarr tofarr deleted the fix-nested-and-docker branch June 4, 2025 19:56
@malhotra5 malhotra5 mentioned this pull request Jun 5, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants