feat: append_file incl. all tests [agentskills] #2346

tobitege · 2024-06-09T13:28:33Z

Replacement for discarded first draft: #2207

Adds append_file to Agentskills plugin. Includes unit tests and adapted integration tests.

This PR adds a new method append_file(content: str) to the agent skills.

In testing OD with some file operations, it is a noticable issue for repeated errors by the model (used Gemini Pro 1.5) that adding content to a file often causes exceptions due to an invalid "start" line number for the edit_file command.

It seems very hard for the model to identify or keep track the total number of lines in a file or it is just not good enough in counting (ask an LLM to count the words of its answer and you know what I mean).
This issue can cause extra cost when it shouldn't.

Such an error then looks like this:

An example prompt to demonstrate the use of it could be like:
"Write the numbers 5 to 10 into a new file named test.txt"
A followup prompt could then just say:
"Append the numbers 20 to 25 to the same file."
If it were to use the edit_file command, chances would be high that it would use the wrong line number(s) again.

cc @neubig @li-boxuan

(blocked by #2085 as well)

tobitege · 2024-06-09T16:20:34Z

It'd be cool if one reviewer could trigger the workflow to start just to see how the tests go, but no merging yet.

…nto append_file

li-boxuan · 2024-06-09T18:27:43Z

opendevin/runtime/plugins/agent_skills/agentskills.py

+def append_file(content: str) -> None:
+    """Append content to the open file.
+
+    It appends text `content` to the end of the open file. Remember, the file must be open before editing.


You will create a follow-up PR to add file_name as part of edit_file's argument anyways, right? Shall we include file_name here, rather than rely on the CURRENT_FILE?

Correct, in a follow-up... just for the fact that integration tests are a PITA to get right, to be blunt. ;)
That both append_/edit_file then get as 1st param a file_name makes total sense.

ok sounds fine to me

I am not sure about @neubig 's benchmarking plan. If we decide to do benchmarking before this PR gets merged, then we should probably include the fix to edit_file in this PR too. Otherwise, we could merge this first, and then edit_file, and maybe a few other PRs, then run evaluation at once.

I decided to work on bringing in the new file_name param for both append/edit_file methods.
Core changes and unit tests done. Integration tests next...

li-boxuan · 2024-06-09T18:28:57Z

opendevin/runtime/plugins/agent_skills/agentskills.py

+    if not CURRENT_FILE or not os.path.isfile(CURRENT_FILE):
+        raise FileNotFoundError('No file open. Use the open_file function first.')
+
+    # Use a temporary file to write changes


Why not simply append?

My thought was that it would be a change in behavior, as the other methods error if a file wasn't explicitly opened and/or created first.

I see. This behavior will be changed soon anyways, right? I mean, you will revert this part in a follow-up PR?

It's the same as in edit_file. I didn't know this was about to be changed.

cc @li-boxuan @xingyaoww
Till now the behavior is that only existing files could be edited and the LLM was supposed to use create_file and/or open_file for that.
I'd just like to confirm before I'd remove that exception for non-existing files, because that would require more changes again to prompts etc.

Yes, I was already thinking of refactoring the append/edit methods so that both just call one internal method with an extra bool for append, but wanted to do this in followup PR, no problem.

The append is really just to make it easier for the LLM since there were often cases where it tried to start at "end of file + 1" line, got an exception, and had to do it again, thus waisting time+money.

Make sense! My inclination is that ideally, we can do the refractor, if possible, merge it - and when I finish the swe-bench eval, I'll re-run the eval from the start to make sure we are not degrading perf.

I'd just prefer to get this merged before I do a refactor and in the meantime another patch comes along with changed unit/integration tests. That takes a lot of time to redo.

Approve the PR to unblock you :)

Need a 2nd reviewer 🤣

tests/integration/mock/CodeActAgent/test_edits/prompt_003.log

li-boxuan · 2024-06-09T18:37:09Z

Btw why is this blocked by #2085? I think we could finish this PR, merge it, note it down, and do evaluation afterwards.

Doing evaluation for every change to prompt might be too costly.

tobitege · 2024-06-09T18:40:00Z

Btw why is this blocked by #2085? I think we could finish this PR, merge it, note it down, and do evaluation afterwards.

Doing evaluation for every change to prompt might be too costly.

I copied that note about #2085 over, just in case.

tobitege · 2024-06-09T19:03:54Z

Ok, great to see that besides Mac, all the unit and integration tests worked. 🎸

@neubig please let me know if this stays blocked for 2085

tobitege · 2024-06-10T06:19:49Z

Commit incoming shortly with file_name added as parameter for append-/edit_file methods incl. updated tests...

…nto append_file

tobitege · 2024-06-10T07:02:46Z

tests/integration/test_agent.py

+        ):
+            print(f'Setting workspace_base to {config.workspace_base}')
+            workspace_base = config.workspace_base
+


@li-boxuan this is one thing that took me hours to find and resolve:
The runtime loads values from the config.toml file, which may have a different workspace_base than what regenerate sets. This resulted in files being read and written in 2 different folders, thus multiple tests failing, like the "bad.txt" one.
Running in WSL here.

Edit: sorry, it's not showing all the lines above, should be 19-28, for context.

That's very weird, because the script sets environment variable WORKSPACE_BASE, which would override the value in config.toml.

I just tried and it worked as expected. I set workspace_base="./workspace_base_in_toml" in my config.toml and ran the script. I printed out workspace_base after L17 and the value indeed was _test_workspace (defined in the script).

I am using Mac btw. Not sure if we have some os-dependent behavior somewhere

I am using Mac btw. Not sure if we have some os-dependent behavior somewhere

That is what I fear with WSL right now.
Could you try this on your end if the file-related tests work with the config.toml file pointing to some different folder, please?
TEST_ONLY=true ONLY_TEST_AGENT="CodeActAgent" ./tests/integration/regenerate.sh
In my case, I have in config.toml:
workspace_base="/mnt/d/github/workspace"
which is one folder "below" my OpenDevin repo.

Now that you mention it, I do have that set to true in my config.toml.
I guess I should try with false then? 😁

I don't think it's tested with integration tests? I would set it to false, and clean up both containers and (opendevin) images from docker to be sure.

Maybe we should start a separate issue about this as this gets a bit long within this PR, what do you think?

persist_sandbox is not tested because it doesn't work (fully) yet. E.g. #2176 I don't think it would be even able to pass all tests at the moment.

So yeah that might very likely be the curse.

tobitege · 2024-06-10T07:19:02Z

Also, can someone confirm that this should work without a config.toml present?
TEST_ONLY=true ONLY_TEST_AGENT="CodeActAgent" ./tests/integration/regenerate.sh
For me this errors if no config.toml exists.

li-boxuan · 2024-06-10T07:43:05Z

Also, can someone confirm that this should work without a config.toml present?

Yes. That's exactly how CI is set up.

enyst · 2024-06-10T08:25:05Z

Also, can someone confirm that this should work without a config.toml present? TEST_ONLY=true ONLY_TEST_AGENT="CodeActAgent" ./tests/integration/regenerate.sh For me this errors if no config.toml exists.

It works with no toml for me too. Do you have env vars set, like any of the workspace vars? Any chance you see what error it gives you with no toml?

tobitege · 2024-06-10T08:29:41Z

Any chance you see what error it gives you with no toml?

My WSL env doesn't have any OpenDevin vars.
I'll give it a try again without toml later today and try to find out more details.

tobitege · 2024-06-10T08:38:29Z

If you guys are ok with the extra check staying - and avoiding potential WSL issues - then I think this PR is good to go. 🚀

xingyaoww

LGTM overall! Approve to unblock the PR so @tobitege can work on a refractored version of the edit_file - or feel free to do it in this PR as well.

I can run a more comprehensive eval once we fix all the kinks in swe-bench eval and revert when necessary.

tobitege · 2024-06-10T14:30:38Z

I can run a more comprehensive eval once we fix all the kinks in swe-bench eval and revert when necessary.

Ah, excellent, just overlapped with my reply above. Cheers!

tobitege · 2024-06-10T15:36:59Z

cc @neubig @li-boxuan ✉️ kind invitation for review ✏️ 👨 😃

tobitege · 2024-06-10T16:57:42Z

@yufansong please click 🕹️ 😂

li-boxuan · 2024-06-11T01:52:19Z

opendevin/runtime/plugins/agent_skills/agentskills.py

+        first_error_line = None
+        for line in error_message.split('\n'):
+            if line.strip():
+                # The format of the error message is: <filename>:<line>:<column>: <error code> <error message>


tobitege added 2 commits June 9, 2024 15:22

new skill: append_file incl. all tests

e54d79e

Merge branch 'OpenDevin:main' into append_file

6291a8b

tobitege added 3 commits June 9, 2024 19:41

more tests needed caring

0d13f2b

Merge branch 'main' into append_file

2c8e25d

Merge branch 'append_file' of https://github.com/tobitege/OpenDevin i…

7642c15

…nto append_file

li-boxuan reviewed Jun 9, 2024

View reviewed changes

tobitege added 3 commits June 9, 2024 23:03

Merge branch 'main' into append_file

848ec1a

Merge branch 'main' into append_file

593724c

Merge branch 'main' into append_file

7f34f59

tobitege added 2 commits June 10, 2024 08:56

file_name for append_file/edit_file; updated tests

b2b581e

Merge branch 'append_file' of https://github.com/tobitege/OpenDevin i…

265a2d2

…nto append_file

tobitege requested review from enyst and li-boxuan June 10, 2024 06:58

tobitege commented Jun 10, 2024

View reviewed changes

tobitege mentioned this pull request Jun 10, 2024

Correct linting error context in agentskills:edit_file #2210

Closed

Merge branch 'main' into append_file

4c1c53c

tobitege mentioned this pull request Jun 10, 2024

[Bug]: CodeAct repeatedly edits the wrong file #1890

Closed

2 tasks

xingyaoww approved these changes Jun 10, 2024

View reviewed changes

tobitege mentioned this pull request Jun 10, 2024

Add integration test for CodeActSWEAgent #2377

Merged

tobitege changed the title ~~feat: append_file incl. all tests [agentskills] (wait for 2085!)~~ feat: append_file incl. all tests [agentskills] Jun 10, 2024

tobitege added 2 commits June 10, 2024 18:25

Merge branch 'main' into append_file

8b28174

Merge branch 'main' into append_file

054b3d9

neubig enabled auto-merge (squash) June 10, 2024 17:05

neubig merged commit 9605106 into All-Hands-AI:main Jun 10, 2024
2 checks passed

li-boxuan reviewed Jun 11, 2024

View reviewed changes

tobitege mentioned this pull request Jun 11, 2024

fix: Agentskills enhancements #2384

Merged

6 tasks

feat: append_file incl. all tests [agentskills] #2346

feat: append_file incl. all tests [agentskills] #2346

Conversation

tobitege commented Jun 9, 2024

Uh oh!

tobitege commented Jun 9, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

li-boxuan Jun 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tobitege Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xingyaoww Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

li-boxuan commented Jun 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tobitege commented Jun 9, 2024

Uh oh!

tobitege commented Jun 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tobitege commented Jun 10, 2024

Uh oh!

tobitege Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

li-boxuan Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tobitege commented Jun 10, 2024

Uh oh!

li-boxuan Jun 9, 2024 •

edited

Loading

tobitege Jun 10, 2024 •

edited

Loading

xingyaoww Jun 10, 2024 •

edited

Loading

li-boxuan commented Jun 9, 2024 •

edited

Loading

tobitege commented Jun 9, 2024 •

edited

Loading

tobitege Jun 10, 2024 •

edited

Loading

li-boxuan Jun 11, 2024 •

edited

Loading