Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Remove nested git repositories before adding files in SWE-bench #6536

Merged
merged 3 commits into from
Feb 28, 2025

Conversation

magic3007
Copy link
Contributor

Problem
In SWE-Bench-like benchmark, the agent may create a .git repository in the local directory when reproducing the error (e.g., the case iterative__dvc-5336 in swe-gym-lite benchmark). As a result, an error would occur when I executed git add -A later.

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll do some testing to see if it breaks normal eval, if not - I'll merge it!

@xingyaoww xingyaoww self-assigned this Jan 30, 2025
@mamoodi
Copy link
Collaborator

mamoodi commented Feb 26, 2025

Gentle reminder in case this fell of your radar @xingyaoww

@xingyaoww
Copy link
Collaborator

👀 running a quick eval now

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM!

Did a quick run by merging this branch on top of #6977 - it is able to solve 4 more problems! not much but definitely an improvement!

X=/home/xingyaow/OpenHands-eval/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.27.0-no-hint-pr6536-run_1/output.jsonl
Y=/home/xingyaow/OpenHands-eval/evaluation/evaluation_outputs/outputs/princeton-nlp__SWE-bench_Verified-test/CodeActAgent/claude-3-7-sonnet-20250219_maxiter_100_N_v0.26.0-no-hint-pr6977-tool-only-w-updatedswb-run_1/output.jsonl
# diff=46
----------------------------------------------------------------------------------------------------
# x resolved but y not=25
                          instance_id                                           report_x                                           report_y
176              django__django-11265  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
128              django__django-11333  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
77               django__django-11728  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
46               django__django-12663  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
76               django__django-12713  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
101              django__django-12858  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
165              django__django-13417  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
182              django__django-14500  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
89               django__django-15128  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
187              django__django-15930  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
147              django__django-16136  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
47       matplotlib__matplotlib-22871  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
136      matplotlib__matplotlib-23412  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
65       matplotlib__matplotlib-25311  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
31       matplotlib__matplotlib-25775  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
95                pydata__xarray-6599  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
97            pylint-dev__pylint-6386  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
9             pytest-dev__pytest-5262  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
126           pytest-dev__pytest-7236  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
94   scikit-learn__scikit-learn-13496  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
180  scikit-learn__scikit-learn-14087  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
27           sphinx-doc__sphinx-10466  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
168           sphinx-doc__sphinx-7454  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
166           sphinx-doc__sphinx-8548  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
75                 sympy__sympy-19783  {'empty_generation': False, 'resolved': True, ...  {'empty_generation': False, 'resolved': False,...
----------------------------------------------------------------------------------------------------
# y resolved but x not=21
                          instance_id                                           report_x                                           report_y
162            astropy__astropy-14096  {'empty_generation': True, 'resolved': False, ...  {'empty_generation': False, 'resolved': True, ...
79               django__django-10914  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
141              django__django-11299  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
29               django__django-11999  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
90               django__django-13158  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
99               django__django-14007  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
30               django__django-14311  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
104              django__django-14404  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
153              django__django-15037  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
158              django__django-15161  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
115              django__django-16661  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
146      matplotlib__matplotlib-13989  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
142      matplotlib__matplotlib-24627  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
188      matplotlib__matplotlib-24637  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
88            pytest-dev__pytest-7982  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
185  scikit-learn__scikit-learn-14710  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
83           sphinx-doc__sphinx-10323  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
80            sphinx-doc__sphinx-7757  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
48            sphinx-doc__sphinx-8621  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
85            sphinx-doc__sphinx-9698  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
5                  sympy__sympy-15599  {'empty_generation': False, 'resolved': False,...  {'empty_generation': False, 'resolved': True, ...
----------------------------------------------------------------------------------------------------

@xingyaoww xingyaoww enabled auto-merge (squash) February 28, 2025 00:54
@xingyaoww xingyaoww merged commit 8a58e72 into All-Hands-AI:main Feb 28, 2025
14 checks passed
adityasoni9998 pushed a commit to adityasoni9998/OpenHands that referenced this pull request Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants