Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(fix): Condensation events to reconstruct contexts added to event stream #7353

Merged
merged 49 commits into from
Mar 27, 2025

Conversation

csmith49
Copy link
Collaborator

  • This change is worth documenting at https://docs.all-hands.dev/
  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality that this introduces.


Give a summary of what the PR does, explaining any non-trivial design decisions.

This PR updates the condenser interface to support a stateless implementation of a rolling condenser.

Instead of returning just a condensed history, this new interface allows a condenser to return a View of the history or a Condensation event. The condensation event can be added to the event stream, which is then one step closer to being the only source of state needed for the agent.

To implement this strategy, rolling condensers now define three abstract functions:

  • get_view, which converts the sequence of events in the state to a view object by utilizing the information in the condensation events,
  • get_condensation, which takes a view and produces a condensation event defining how future views should be constructed, and
  • should_condense, a trigger which determines when the condensation should happen.

In addition to the condenser changes, this PR updates the CodeAct agent implementation to use the new interface and updates all condenser tests accordingly.


Link of any specific issues this addresses.

This directly solves the problem in #6706, but should also resolve #7175.

@enyst
Copy link
Collaborator

enyst commented Mar 19, 2025

Thank you for this! It seems to do exactly what I hit a problem with, in the other PR.

I started at the other end of this issue (put condensation events in the stream), and meant to keep it minimal, just to see what it looks like, but such refactoring needs also something like this PR, in order to actually work. 😅

@csmith49
Copy link
Collaborator Author

csmith49 commented Mar 19, 2025

Thank you for this! It seems to do exactly what I hit a problem with, in the other PR.

I started at the other end of this issue (put condensation events in the stream), and meant to keep it minimal, just to see what it looks like, but such refactoring needs also something like this PR, in order to actually work. 😅

I totally missed that PR! Had my head down digging into some metrics.

This approach turned out way less intensive than I expected. I've got the condensers doing the right thing, just now getting to the point of testing the solution with the agent controller in the mix. Also need to figure out how to test the failure case we were seeing...

I haven't dug through your PR yet. Do you think our implementations can just be smashed together?

@enyst
Copy link
Collaborator

enyst commented Mar 19, 2025

I've got the condensers doing the right thing, just now getting to the point of testing the solution with the agent controller in the mix. Also need to figure out how to test the failure case we were seeing...

On the latter, maybe you want to see the 2 comments with debugging info. Idk if it's useful, they're literally like random notes though. The thing is, I got a great event stream, totally by accident, it was exactly a few tokens away from ContextWindowExceededError 😅. So on restore, if the first message is like 10-ish tokens, it got the error, and if it was less, it didn't.

#7311 (comment)

I literally removed newer events from my stream and restored the first ones repeatedly. That said, I was looking at more things than condensation actually (some strange LLM behavior on microagents)

@enyst
Copy link
Collaborator

enyst commented Mar 19, 2025

I haven't dug through your PR yet. Do you think our implementations can just be smashed together?

I am still to check your latest changes, I think we may want to sync at least on what kind of Action do we want here, it kinda matters for the rest IMHO. This kind of thing may help merging them (I meant to not change how the condensers work), idk.

@csmith49
Copy link
Collaborator Author

On the latter, maybe you want to see the 2 comments with debugging info. Idk if it's useful, they're literally like random notes though. The thing is, I got a great event stream, totally by accident, it was exactly a few tokens away from ContextWindowExceededError 😅. So on restore, if the first message is like 10-ish tokens, it got the error, and if it was less, it didn't.

I'll take a closer look, thanks. It'd be great if we could also get the condensation trigger behavior to be token-count aware, even if we're just approximating it. With enough of a safety factor we might be able to avoid context window errors entirely.

I am still to check your latest changes, I think we may want to sync at least on what kind of Action do we want here, it kinda matters for the rest IMHO.

My first pass had different events for each kind of condenser, because they wanted slightly different bits of information. The current approach just has a single event with the union of all those features. I'd love to have some kind of "payload polymorphism" so that even with a single event there's still some type-safety, but the current event implementation makes that pretty fiddly. Might revisit that after a little event clean-up.

As it stands, I've got all existing tests working and have confirmed condensation happens appropriately without disrupting the agent flow. There's still some clean-up needed on documentation, and I want to add more tests, but I'm decently happy with where this PR is now.

1. By providing a list of event IDs.
2. By providing the start and end IDs of a range of events.

In the second case, we assume that event IDs are monotonically increasing, and that _all_ events between the start and end IDs are to be forgotten.
Copy link
Collaborator

@enyst enyst Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I also noticed again while rebuilding lists that this is completely reasonable, but for the condense actions themselves. They will be in the stream/history interweaved with the last big list to be forgotten.

This isn't just an assumption, it's by definition. We can define the start id and end id to be the range between which most condensers (the exception being attention) have their forgotten events in that step.

In fact, whether we need the other condense actions is a question for the condenser implementation. Outside the condensers, if we want to send them to the agent LLM, we can pick them up. They're in the stream, so they're in history, so we don't, strictly speaking, need the condensers to also provide those, we can pick them up if we want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm so we can resolve this comment -- all event IDs are ordered monotonically increasing as they get added to the event stream? So if event_i is added to the event stream, then event_j is added, it will always be the case that event_i.id < event_j.id?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The stream itself assigns ids in ascending order, in the add_event method, which is the only way to do it, and the _id field is otherwise never set.

@csmith49
Copy link
Collaborator Author

csmith49 commented Mar 27, 2025

EDIT TO ADD: I'm really trying to avoid flat representations of polymorphic events like we see in #7526. Very easy to do with pydantic but with the current system I keep running into all kinds of corner cases.

I thought there are reasons in favor of it, such as:
...

Sorry, I don't mean to imply it's a bad design pattern in general. In this PR I just kept running into issues where I'd have a huge set of fields and a single discriminator enum that would control which combinations of fields should be enabled and how they should be interpreted.

I can add validators in __post_init__ to check these configurations, but I can't make the data class frozen or use that validation to enforce type guarantees so there were always a lot of unnecessary assertions when trying to get access to those combinations of fields.

  • a single event for all, as in... usable for all no matter what it contains? But... can that be the case?

That's the approach I tried to go with. The current implementation has that working, but I couldn't avoid all instances of that flat polymorphism pattern without just relying on lists of integers for event identifiers.

There's a big assumption about event IDs in the range calculations though -- can you confirm that's an okay assumption to make?

I'm not sure I understand, sorry, could you elaborate?

The assumption is that events have monotonically increasing (wrt their creation time) IDs. That way I can define start/end ranges based on IDs and -- since the forgotten events act as a filter over the history -- I won't miss any events or forget one that I don't want to.

@enyst
Copy link
Collaborator

enyst commented Mar 27, 2025

Sorry, I don't mean to imply it's a bad design pattern in general. In this PR I just kept running into issues where I'd have a huge set of fields and a single discriminator enum that would control which combinations of fields should be enabled and how they should be interpreted.

I see... was that an artefact of the combinations of fields in the work in progress or does it have to be the case always? I ask because I'm not sure the combinations are many... with the range, I mean.

The assumption is that events have monotonically increasing (wrt their creation time) IDs. That way I can define start/end ranges based on IDs and -- since the forgotten events act as a filter over the history -- I won't miss any events or forget one that I don't want to.

Yes, this is reasonable. It is definitely the case in the stream, and it is the case in history save for the truncation in action. I can look into stopping that, but if this PR works as expected, that case should be extremely rare with Claude and other LLMs with decent-ish windows.

@csmith49
Copy link
Collaborator Author

I see... was that an artefact of the combinations of fields in the work in progress or does it have to be the case always? I ask because I'm not sure the combinations are many... with the range, I mean.

It was definitely an issue with an earlier approach that captured more info. The options are not numerous here, as you can see in CondensationAction.

@enyst
Copy link
Collaborator

enyst commented Mar 27, 2025

Sorry, I tried to test that it works as expected, when restoring a session:

  • made on main branch
  • made on the PR branch

It doesn't seem to work at the moment because it won't recognize the old condense action? It creates actions, but it recognizes obs.
Edited to add: Ah, we also create an obs with the same summary. Not sure if it's necessary but that obs won't be saved. Maybe that doesn't matter?

@enyst
Copy link
Collaborator

enyst commented Mar 27, 2025

I restored a ~450 events session made from the main branch. It's the one I was mentioning before, it's at the edge of a natural ContextWindowExceededLimit. (near, but not yet there)

image

It doesn't manage it, it falls back to the controller truncation, and then condensation happens at half (event_id=231). I think this is because condensers don't have error handling, and it gets the exception in the condenser LLM call.

  • the "trimming prompt" from truncation is twice because I stopped it from the debugger, so it didn't save state. This looks normal to me.
  • when it converts into Messages it sends all 3 condense observations.

@neubig
Copy link
Contributor

neubig commented Mar 27, 2025

@enyst: Hi! I'm going to also spend some time taking a look at this today and trying to help with testing and getting it merged, since this is high priority. Would you be able to share that trajectory either as-is (with quick directions about how to load it in) or curate it into a failing test?

@csmith49
Copy link
Collaborator Author

Sorry, I tried to test that it works as expected, when restoring a session...made on main branch

I don't think we should expect that to work in this case. The old sessions won't have the CondensationAction events threaded through the history, so we'll try just one big condensation on the first agent step.

It doesn't manage it, it falls back to the controller truncation, and then condensation happens at half (event_id=231). I think this is because condensers don't have error handling, and it gets the exception in the condenser LLM call.

That's also expected here, and how it currently works on main. I definitely want to improve the back-and-forth between the two, and I think CondensationAction events are the current shortest path to doing so.

@csmith49
Copy link
Collaborator Author

Edited to add: Ah, we also create an obs with the same summary. Not sure if it's necessary but that obs won't be saved. Maybe that doesn't matter?

That doesn't matter. We can't save that observation in the right spot in the history without violating the temporal ordering of events, so the CondensationAction carries all necessary information for the View reconstruction to put it in the right spot before handing the view off to the LLM.

Comment on lines +120 to +122
There are two ways to specify the events to be forgotten:
1. By providing a list of event IDs.
2. By providing the start and end IDs of a range of events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this class again, do we actually need these two code paths? Couldn't we just do the first one and the functionality would be a superset of the second one? Sure it'd be a little bit less efficient memory-wise, but storing a list of ints is not really a big deal I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first path is a functional super-set of the second, yes. Engel expressed concerns about the size of the messages so I added the suggested range representation as an optimization. I don't have numbers on how much it saves but I think the overall memory impact is minor either way (maybe ~0.1% of the data in the whole trajectory?).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm sorry, it may be just me and my inefficiency hate: they're consecutive, so unnecessary and they would be saved/read in every condensation action. 🥹 They would also be sent via websocket.

Our usual case is the summarizer, where they're consecutive. In fact, all of them, except for llm_attention, specify, at most, a range: from start to end.

IMHO a way to represent these would be with polymorphism and discriminated union. But the event stream is "flat". When we represented this kind of case before, we made a flattened representation with an enum as discriminator for serialization and some validation, and an umbrella class for the necessary fields. Calvin doesn't like that, quite understandably - and here they would be more than two. 😅

This is an umbrella class, but with just minimal options, which is two. It's a middle-ground I'm happy with, but maybe it's bad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, personally I think code complexity is worse than small amounts of inefficiency. But it's not a hill that I'll die on :)

@enyst
Copy link
Collaborator

enyst commented Mar 27, 2025

@neubig Sure, but sorry, just to clarify, I agree with Calvin that ContextWindowExceeded not handled on a previous session is expected. We don't have exception handling in condensers, so if ContextWindowExceededError happens during their LLM call, falls to the controller.

With this PR, those should extremely rare, at least with LLMs with some decent context window size.

Except for previous sessions, that is, sessions made on main etc, if they already are at the limit or more, those will fail like before, but I think that's okay. The newer sessions should recognize the new Actions in history. IMHO we don't need to solve it here (it's longstanding and known, and IMHO not trivial).

5c3288e4f277426388257d13a3a6ab9a.zip
If you wish to take a look at this fun stream, it's here. I think it can be loaded with the UI by simply placing the directory in the local filestore. If that doesn't work (haven't tried lately), then running with main.py and stopping in the debugger to switch sid to this sid would work. Deleting all events from 451 (inclusive), and load the rest, loads a nice history just under 200k, so the next user message would bump it over the Anthropic limit.

@enyst
Copy link
Collaborator

enyst commented Mar 27, 2025

I have failed to test on the PR branch (for reasons unrelated to the PR, I'm sure), I can give it another try just now. IMHO otherwise the PR looks great, and I just wanted to see what it does on session restore.

Edited: Sorry, I see it. It works just fine.

Session on the PR branch, 2 condensation actions -> closed, restored
=> observation included as expected, number of events looks right.

Sorry to be an annoying nitpicker, I think it's great and good to go.

@csmith49 csmith49 merged commit 42712a4 into All-Hands-AI:main Mar 27, 2025
15 checks passed
@csmith49 csmith49 deleted the fix/condenser-visibility branch March 27, 2025 19:16
Tetsu-is pushed a commit to Tetsu-is/OpenHands that referenced this pull request Apr 4, 2025
Tetsu-is added a commit to Tetsu-is/OpenHands that referenced this pull request Apr 4, 2025
Tetsu-is pushed a commit to Tetsu-is/OpenHands that referenced this pull request Apr 4, 2025
Tetsu-is pushed a commit to Tetsu-is/OpenHands that referenced this pull request Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The context window seems to "reset" to earlier points, rather than later points in time
3 participants