-
Notifications
You must be signed in to change notification settings - Fork 1
Tool retrieval 2 #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool retrieval 2 #417
Conversation
# Rest of your code remains the same | ||
response = await openai_client.beta.chat.completions.parse( | ||
messages=[{"role": "system", "content": system_prompt}, *openai_messages], # type: ignore | ||
model="gpt-4o-mini", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be turned into an env var but we have enough already. Please let me know if you would prefer an env var.
One comment, the selection LLM has some trouble to select the exa crawling tool. Might want to give it a better description. |
Great point. Not sure if directly applicable in this PR but I added a comment here: #415 |
Is it okay if we address this in #415 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works perfectly:) Thank you @WonderPG
Benchmark
Query speed (from sending to stop streaming)
This PR
Main
In general the latency is significantly reduced with this approach.
Token count
This PR
Currently the selection LLM has a system prompt with 7,545 tokens. After selection, the main LLM has roughly 10,000 tokens worth of tool description when selecting 10 tools.
This indeed depends on the selected tools and the number of tools. Add to that the main LLM's system prompt worth 1,164 tokens for a total of ~19,000 tokens. Bear in mind that the main LLM only has ~11k tokens injected per request since the tokens of the selection model don't make it to the main model, i.e. 7k tokens are cheaper and run faster while the remaining 11-12k are more expensive and a bit slower.
Main
Currently we have 88,022 tokens from the tools and 1,164 tokens coming from the system prompt. In total we have a minimum of 89,186 tokens injected in every query.
In general the cost goes down by a factor 8.
The following points are also to be considered: