-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prompt caching (Sonnet, Haiku only) #3411
Add prompt caching (Sonnet, Haiku only) #3411
Conversation
|
Those first two messages are a great start! The initial prompt is quite large. Thank you for doing this ❤️ Just to note, from Anthropic docs, there is also the fact that caching costs 25% more, while cache hits are 10% of the regular cost. I'm thinking that we may want to have this feature enabled by default, but it should probably be possible to disable easily enough, for various scenarios. |
Just realized one thing, we can just put all the previous prompts(after the first two) inside just one message and we are done, for example:
Every 10 messages we update this block. |
A few more from Anthropic docs:
So we don't need to merge messages ourselves, instead set a checkpoint on the end message. Which is the end message is less clear to me. 😅 I have to test some more moving the messages checkpoint. If I see this right, though, we also don't need to define separate checkpoints for the two initial messages: just on the in-context example, it will include the previous, and it's always there. What do you think? The response gives a report on the number of tokens cached and cache hits, IMHO if we can add it to logs it would be useful, just like we do cost control. They also say,
Well, CodeAct prompt is large. 🤣 So that's likely no issue here. |
Then, would it not be a miss if new content is added, I will check this out. Thanks for pointing it out, I missed it. Will also add it to the logs. |
https://github.com/anthropics/anthropic-cookbook/blob/main/misc/prompt_caching.ipynb. The idea is to add the cache point on the recent two messages. This makes the caching incremental, whatever is missed is added and the earlier part is reused. Also, it will be a good idea to cache the system prompt since the user might reset and start a new conversation. |
I feel caching works now. 2024-08-18.23-39-39.mp4 |
…into add-prompt-caching
I am unable to run them. |
…Devin into add-prompt-caching
e6b5b0c
to
fa516f7
Compare
@Kaushikdkrikhanu Sorry about this. We've been making changes to the workflows these days. We don't need to regenerate, I think the problem now is that a workflow is required, but it has permissions errors on forks. Nothing we can fix in this PR, we need to fix it on the repo. On the PR, I added some tests, as @anthony-chaudhary suggested, and made some experiments. I hope you don't mind I also committed and tested on the branch. Eventually I came around to your solution (last messages, reminder in new TextContent), after testing some others, with only a small tweak. Please do tell if you think it's not okay as it is now. @anthony-chaudhary You make a good point that this could be treated as more of an LLM thing - it is an LLM option after all! - but I'm sure, as @Kaushikdkrikhanu mentioned, that litellm will make it easier to use. Moreover, apart from @Kaushikdkrikhanu 's argument about changing after the Message is serialized, is another fun fact: caching goes from the beginning of the system prompt to the marked message, and while agents currently have an almost 100% static history of messages, which allows us to insert caching fairly easily every step as we do here, that might change. Not sure yet how the prompts will look like, but in any case I think the agent should be able to decide when it can cache and when it doesn't. I think we could let the LLM class set the headers though, it doesn't feel right in the agent. 😅 |
@Kaushikdkrikhanu Also, I think you're right litellm will make this more friendly. Regardless, we can add an option to have the LLM class deal with headers and refactor them out of the agent, but FWIW it's fine with me if it's a subsequent PR. Thank you! |
@enyst Thank you for taking the time to make the changes, looking into the issues, writing unit test and testing it out. |
Hey @enyst and @Kaushikdkrikhanu , just wanted to check, it seems like tests are failing. We're all excited to get this in, so would you be able to try to resolve the tests? |
Since the PR workflow is fixed now, all tests should now finish successfully. |
Awesome, thanks for this! Exciting feature :) |
What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?
Add prompt caching to codeact agent and log token cached.
Give a summary of what the PR does, explaining any non-trivial design decisions
#3397
2024-08-18.23-39-39.mp4
It doesn't seem like Gemini caching support has been added to litellm, so only added support for claude for now. BerriAI/litellm#4284
Other references