-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Remove nested git repositories before adding files in SWE-bench #6536
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'll do some testing to see if it breaks normal eval, if not - I'll merge it!
Gentle reminder in case this fell of your radar @xingyaoww |
👀 running a quick eval now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM!
Did a quick run by merging this branch on top of #6977 - it is able to solve 4 more problems! not much but definitely an improvement!
X=/home/xingyaow/OpenHands-eval/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.27.0-no-hint-pr6536-run_1/output.jsonl
Y=/home/xingyaow/OpenHands-eval/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.26.0-no-hint-pr6977-tool-only-w-updatedswb-run_1/output.jsonl
# diff=46
----------------------------------------------------------------------------------------------------
# x resolved but y not=25
instance_id report_x report_y
176 django__django-11265 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
128 django__django-11333 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
77 django__django-11728 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
46 django__django-12663 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
76 django__django-12713 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
101 django__django-12858 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
165 django__django-13417 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
182 django__django-14500 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
89 django__django-15128 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
187 django__django-15930 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
147 django__django-16136 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
47 matplotlib__matplotlib-22871 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
136 matplotlib__matplotlib-23412 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
65 matplotlib__matplotlib-25311 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
31 matplotlib__matplotlib-25775 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
95 pydata__xarray-6599 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
97 pylint-dev__pylint-6386 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
9 pytest-dev__pytest-5262 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
126 pytest-dev__pytest-7236 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
94 scikit-learn__scikit-learn-13496 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
180 scikit-learn__scikit-learn-14087 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
27 sphinx-doc__sphinx-10466 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
168 sphinx-doc__sphinx-7454 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
166 sphinx-doc__sphinx-8548 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
75 sympy__sympy-19783 {'empty_generation': False, 'resolved': True, ... {'empty_generation': False, 'resolved': False,...
----------------------------------------------------------------------------------------------------
# y resolved but x not=21
instance_id report_x report_y
162 astropy__astropy-14096 {'empty_generation': True, 'resolved': False, ... {'empty_generation': False, 'resolved': True, ...
79 django__django-10914 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
141 django__django-11299 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
29 django__django-11999 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
90 django__django-13158 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
99 django__django-14007 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
30 django__django-14311 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
104 django__django-14404 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
153 django__django-15037 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
158 django__django-15161 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
115 django__django-16661 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
146 matplotlib__matplotlib-13989 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
142 matplotlib__matplotlib-24627 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
188 matplotlib__matplotlib-24637 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
88 pytest-dev__pytest-7982 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
185 scikit-learn__scikit-learn-14710 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
83 sphinx-doc__sphinx-10323 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
80 sphinx-doc__sphinx-7757 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
48 sphinx-doc__sphinx-8621 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
85 sphinx-doc__sphinx-9698 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
5 sympy__sympy-15599 {'empty_generation': False, 'resolved': False,... {'empty_generation': False, 'resolved': True, ...
----------------------------------------------------------------------------------------------------
…ll-Hands-AI#6536) Co-authored-by: Xingyao Wang <[email protected]>
Problem
In
SWE-Bench
-like benchmark, the agent may create a.git
repository in the local directory when reproducing the error (e.g., the caseiterative__dvc-5336
inswe-gym-lite
benchmark). As a result, an error would occur when I executedgit add -A
later.