Skip to content

feat(aci): adding file viewing capability to different extension types #8742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 2, 2025

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented May 27, 2025

  • This change is worth documenting at https://docs.all-hands.dev/
  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.


Summarize what the PR does, explaining any non-trivial design decisions.

Part of changes in #8598. This PR contains changes to the str_replace_editor

Link of any specific issues this addresses:


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:26121f1-nikolaik   --name openhands-app-26121f1   docker.all-hands.dev/all-hands-ai/openhands:26121f1

@xingyaoww
Copy link
Collaborator Author

Baseline performance with claude 4 sonnet:

    "submitted_instances": 50,
    "completed_instances": 50,
    "resolved_instances": 38,
    "unresolved_instances": 12,

@xingyaoww
Copy link
Collaborator Author

xingyaoww commented May 28, 2025

With changes:

Instances submitted: 50
Instances completed: 49
Instances incomplete: 450
Instances resolved: 36
Instances unresolved: 13

I'm gonna run a larger set (200) to make sure

@xingyaoww
Copy link
Collaborator Author

On main: 133/200
On this PR: 138/200

Let's go with this @ryanhoangt

@xingyaoww xingyaoww changed the title [Pending Eval] OH Versa feat(aci): adding file viewing capability to different extension types May 29, 2025
@xingyaoww xingyaoww force-pushed the xw/aci-gaia-file-viewer branch from ec52897 to 9e5dd86 Compare May 29, 2025 16:00
@openhands-ai openhands-ai bot deleted a comment from openhands-staging bot May 29, 2025
Copy link

openhands-ai bot commented May 29, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run Python Unit Tests
    • Docker

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #8742

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@xingyaoww
Copy link
Collaborator Author

With changes (with updated prompt): 134/200

Total instances: 500
Instances submitted: 200
Instances completed: 199
Instances incomplete: 300
Instances resolved: 134
Instances unresolved: 65
Instances with empty patches: 0
Instances with errors: 1
Unstopped containers: 0
Unremoved images: 500
Report written to claude-sonnet-4-20250514_maxiter_500_N_v0.39.2-no-hint-main-05-27-2025-acl-viewer-prompt-change-v1-run_1.20250602_111239.json
MODEL_NAME_OR_PATH: claude-sonnet-4-20250514_maxiter_500_N_v0.39.2-no-hint-main-05-27-2025-acl-viewer-prompt-change-v1-run_1
RESULT_OUTPUT_DIR: /home/xingyaow/OpenHands-eval/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/claude-sonnet-4-20250514_maxiter_500_N_v0.39.2-no-hint-main-05-27-2025-acl-viewer-prompt-change-v1-run_1
Checking for changes: 0it [00:00, ?it/s]
Updating output file: 200it [00:01, 144.91it/s]

@xingyaoww
Copy link
Collaborator Author

@ryanhoangt I think we can cut a new release in openhands-aci and get it merged

@ryanhoangt ryanhoangt marked this pull request as ready for review June 2, 2025 16:13
Copy link
Collaborator

@ryanhoangt ryanhoangt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants