-
Notifications
You must be signed in to change notification settings - Fork 7k
feat: append_file incl. all tests [agentskills] #2346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It'd be cool if one reviewer could trigger the workflow to start just to see how the tests go, but no merging yet. |
def append_file(content: str) -> None: | ||
"""Append content to the open file. | ||
|
||
It appends text `content` to the end of the open file. Remember, the file must be open before editing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will create a follow-up PR to add file_name
as part of edit_file
's argument anyways, right? Shall we include file_name
here, rather than rely on the CURRENT_FILE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, in a follow-up... just for the fact that integration tests are a PITA to get right, to be blunt. ;)
That both append_/edit_file then get as 1st param a file_name
makes total sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok sounds fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about @neubig 's benchmarking plan. If we decide to do benchmarking before this PR gets merged, then we should probably include the fix to edit_file in this PR too. Otherwise, we could merge this first, and then edit_file, and maybe a few other PRs, then run evaluation at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to work on bringing in the new file_name
param for both append/edit_file methods.
Core changes and unit tests done. Integration tests next...
if not CURRENT_FILE or not os.path.isfile(CURRENT_FILE): | ||
raise FileNotFoundError('No file open. Use the open_file function first.') | ||
|
||
# Use a temporary file to write changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply append?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that it would be a change in behavior, as the other methods error if a file wasn't explicitly opened and/or created first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This behavior will be changed soon anyways, right? I mean, you will revert this part in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the same as in edit_file. I didn't know this was about to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @li-boxuan @xingyaoww
Till now the behavior is that only existing files could be edited and the LLM was supposed to use create_file and/or open_file for that.
I'd just like to confirm before I'd remove that exception for non-existing files, because that would require more changes again to prompts etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was already thinking of refactoring the append/edit methods so that both just call one internal method with an extra bool for append, but wanted to do this in followup PR, no problem.
The append is really just to make it easier for the LLM since there were often cases where it tried to start at "end of file + 1" line, got an exception, and had to do it again, thus waisting time+money.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! My inclination is that ideally, we can do the refractor, if possible, merge it - and when I finish the swe-bench eval, I'll re-run the eval from the start to make sure we are not degrading perf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just prefer to get this merged before I do a refactor and in the meantime another patch comes along with changed unit/integration tests. That takes a lot of time to redo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve the PR to unblock you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a 2nd reviewer 🤣
Btw why is this blocked by #2085? I think we could finish this PR, merge it, note it down, and do evaluation afterwards. Doing evaluation for every change to prompt might be too costly. |
Ok, great to see that besides Mac, all the unit and integration tests worked. 🎸 @neubig please let me know if this stays blocked for 2085 |
Commit incoming shortly with |
): | ||
print(f'Setting workspace_base to {config.workspace_base}') | ||
workspace_base = config.workspace_base | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@li-boxuan this is one thing that took me hours to find and resolve:
The runtime loads values from the config.toml file, which may have a different workspace_base than what regenerate sets. This resulted in files being read and written in 2 different folders, thus multiple tests failing, like the "bad.txt" one.
Running in WSL here.
Edit: sorry, it's not showing all the lines above, should be 19-28, for context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very weird, because the script sets environment variable WORKSPACE_BASE
, which would override the value in config.toml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried and it worked as expected. I set workspace_base="./workspace_base_in_toml"
in my config.toml
and ran the script. I printed out workspace_base
after L17 and the value indeed was _test_workspace
(defined in the script).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using Mac btw. Not sure if we have some os-dependent behavior somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using Mac btw. Not sure if we have some os-dependent behavior somewhere
That is what I fear with WSL right now.
Could you try this on your end if the file-related tests work with the config.toml file pointing to some different folder, please?
TEST_ONLY=true ONLY_TEST_AGENT="CodeActAgent" ./tests/integration/regenerate.sh
In my case, I have in config.toml:
workspace_base="/mnt/d/github/workspace"
which is one folder "below" my OpenDevin repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you mention it, I do have that set to true in my config.toml.
I guess I should try with false then? 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's tested with integration tests? I would set it to false, and clean up both containers and (opendevin) images from docker to be sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should start a separate issue about this as this gets a bit long within this PR, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
persist_sandbox
is not tested because it doesn't work (fully) yet. E.g. #2176 I don't think it would be even able to pass all tests at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yeah that might very likely be the curse.
Also, can someone confirm that this should work without a config.toml present? |
Yes. That's exactly how CI is set up. |
It works with no toml for me too. Do you have env vars set, like any of the workspace vars? Any chance you see what error it gives you with no toml? |
My WSL env doesn't have any OpenDevin vars. |
If you guys are ok with the extra check staying - and avoiding potential WSL issues - then I think this PR is good to go. 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall! Approve to unblock the PR so @tobitege can work on a refractored version of the edit_file
- or feel free to do it in this PR as well.
I can run a more comprehensive eval once we fix all the kinks in swe-bench eval and revert when necessary.
Ah, excellent, just overlapped with my reply above. Cheers! |
cc @neubig @li-boxuan ✉️ kind invitation for review ✏️ 👨 😃 |
@yufansong please click 🕹️ 😂 |
first_error_line = None | ||
for line in error_message.split('\n'): | ||
if line.strip(): | ||
# The format of the error message is: <filename>:<line>:<column>: <error code> <error message> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love this!
Replacement for discarded first draft: #2207
Adds
append_file
to Agentskills plugin. Includes unit tests and adapted integration tests.This PR adds a new method append_file(content: str) to the agent skills.
In testing OD with some file operations, it is a noticable issue for repeated errors by the model (used Gemini Pro 1.5) that adding content to a file often causes exceptions due to an invalid "start" line number for the edit_file command.
It seems very hard for the model to identify or keep track the total number of lines in a file or it is just not good enough in counting (ask an LLM to count the words of its answer and you know what I mean).
This issue can cause extra cost when it shouldn't.
Such an error then looks like this:

An example prompt to demonstrate the use of it could be like:
"Write the numbers 5 to 10 into a new file named test.txt"
A followup prompt could then just say:
"Append the numbers 20 to 25 to the same file."
If it were to use the edit_file command, chances would be high that it would use the wrong line number(s) again.
cc @neubig @li-boxuan
(blocked by #2085 as well)