Skip to content

Tool retrieval 2 #417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 10, 2025
Merged

Tool retrieval 2 #417

merged 25 commits into from
Jul 10, 2025

Conversation

WonderPG
Copy link
Collaborator

@WonderPG WonderPG commented Jul 9, 2025

Benchmark

Query speed (from sending to stop streaming)

This PR

  • Hi: ~4-5s
  • Find one morphology in the thalamus, plot it, get its assets and download the swc one.: ~37s
  • Show me papers about neuron morphologies in the thalamus of rodents: ~16s

Main

  • Hi: ~8s
  • Find one morphology in the thalamus, plot it, get its assets and download the swc one.: ~42s (the download link was wrong because way shorter than it should have been, tried twice)
  • Show me papers about neuron morphologies in the thalamus of rodents: ~20s

In general the latency is significantly reduced with this approach.

Token count

This PR

Currently the selection LLM has a system prompt with 7,545 tokens. After selection, the main LLM has roughly 10,000 tokens worth of tool description when selecting 10 tools.
This indeed depends on the selected tools and the number of tools. Add to that the main LLM's system prompt worth 1,164 tokens for a total of ~19,000 tokens. Bear in mind that the main LLM only has ~11k tokens injected per request since the tokens of the selection model don't make it to the main model, i.e. 7k tokens are cheaper and run faster while the remaining 11-12k are more expensive and a bit slower.

Main

Currently we have 88,022 tokens from the tools and 1,164 tokens coming from the system prompt. In total we have a minimum of 89,186 tokens injected in every query.

In general the cost goes down by a factor 8.

The following points are also to be considered:

  • I observed that the LLM follows better instructions with reduced amount of token per request.
  • This PR introduces a risk of not having the relevant tools make it to the main LLM. I rarely had that happen but it is a possibility. Ideally we should experiment and strike the right balance by selecting: The best intelligence/latency model for the task and the minimum amount of tools that the selection model should output everytime.
  • I am currently putting the tool name and tool description into the system prompt of the selection model. More things could be added for improved performance, feel free to challenge this.

# Rest of your code remains the same
response = await openai_client.beta.chat.completions.parse(
messages=[{"role": "system", "content": system_prompt}, *openai_messages], # type: ignore
model="gpt-4o-mini",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be turned into an env var but we have enough already. Please let me know if you would prefer an env var.

@WonderPG WonderPG marked this pull request as ready for review July 9, 2025 10:12
@BoBer78
Copy link
Collaborator

BoBer78 commented Jul 9, 2025

One comment, the selection LLM has some trouble to select the exa crawling tool. Might want to give it a better description.

@jankrepl
Copy link
Collaborator

jankrepl commented Jul 9, 2025

One comment, the selection LLM has some trouble to select the exa crawling tool. Might want to give it a better description.

Great point. Not sure if directly applicable in this PR but I added a comment here: #415

@WonderPG
Copy link
Collaborator Author

WonderPG commented Jul 9, 2025

Is it okay if we address this in #415 ?

Copy link
Collaborator

@jankrepl jankrepl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works perfectly:) Thank you @WonderPG

@WonderPG WonderPG merged commit 670d1ba into main Jul 10, 2025
6 checks passed
@WonderPG WonderPG deleted the tool-retrieval-2 branch July 10, 2025 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants