Skip to content

Toolbox instantiation should be logged to DB when logging toolbox methods #1089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simonw opened this issue May 26, 2025 · 28 comments
Closed

Comments

@simonw
Copy link
Owner

simonw commented May 26, 2025

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

The key information that needs to be stored in the database here is the configuration that was used to instantiate the Toolbox (for tools that take configuration).

I should probably enforce somewhere that this configuration must be JSON serializable - but is that strictly necessary for people who are using tools exclusively in Python and not via the CLI?

What should the database logging stuff do if a user attempts to serialize a response that used a class instance that didn't use serializable configuration? Cheapest trick would be to do json.dumps(..., default=repr) and maybe that's enough.

Then later what happens if you try to re-inflate a conversation from the database that has invalid serialized configuration? I guess you get an error.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

LLM may need to know if a Toolbox has stayed resident in memory or if it's been discarded and re-inflated. That's material to how tool calling work, otherwise you confuse the models by occasionally clearing state on them (though I've seen Code Interpreter and tools like it handle this OK in the past, they generally decide to try and reconstruct their state.)

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Let's use https://github.com/simonw/llm-tools-datasette as an example. It currently has one argument:

llm --tool 'Datasette("https://datasette.io/content")' "Show tables" --td

I'm contemplating adding a second optional one for authentication (a key to lookup in `llm.get_key()):

If someone runs the above llm --tool prompt what would I need to store in the DB and where would I put it?

If the tool class is instantiated but no methods on it are called should we still record that we instantiated it? The logs need to know "the model had the option to use these tools and chose not to". I think the configuration of the class IS relevant to that, since maybe the model decided not to use tools based on how they had been configured (but can it see that?)

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Part of the challenge here is where this happens.

CLI users have their toolboxes instantiated for them here:

llm/llm/cli.py

Line 807 in e23e13e

tool_implementations = _gather_tools(tools, python_tools)

Or here in llm chat:

llm/llm/cli.py

Line 1091 in e23e13e

tool_functions = _gather_tools(tools, python_tools)

But Python API users instantiate tools directly themselves:

import llm
from llm_tools_datasette import Datasette

model = llm.get_model("gpt-4.1-mini")

result = model.chain(
    "Show most interesting tables",
    tools=[Datasette("https://datasette.io/content")],
).text()

How do we record the "this class was instantiated at this moment and with these arguments" information such that, later on, this code in Response.log_to_db() can see that information and put it somewhere useful?

llm/llm/models.py

Lines 815 to 845 in e23e13e

# Persist any tools, tool calls and tool results
tool_ids_by_name = {}
for tool in self.prompt.tools:
tool_id = ensure_tool(db, tool)
tool_ids_by_name[tool.name] = tool_id
db["tool_responses"].insert(
{
"tool_id": tool_id,
"response_id": response_id,
}
)
for tool_call in self.tool_calls(): # TODO Should be _or_raise()
db["tool_calls"].insert(
{
"response_id": response_id,
"tool_id": tool_ids_by_name.get(tool_call.name) or None,
"name": tool_call.name,
"arguments": json.dumps(tool_call.arguments),
"tool_call_id": tool_call.tool_call_id,
}
)
for tool_result in self.prompt.tool_results:
db["tool_results"].insert(
{
"response_id": response_id,
"tool_id": tool_ids_by_name.get(tool_result.name) or None,
"name": tool_result.name,
"output": tool_result.output,
"tool_call_id": tool_result.tool_call_id,
}
)

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

I'm inclined to have the llm.Toolbox subclass do some magic here, somehow recording its moment of instantiation such that later on code with access to self.prompt.tools can see what it recorded.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

llm -m claude-4-opus -f issue:https://github.com/simonw/llm/issues/1089 \
  -f llm/models.py -f llm/cli.py \
  -s 'Suggest potential code approaches for this issue'

https://gist.github.com/simonw/7b42c18a87f7caed5d5016a14f4e7c99#response - not hugely useful. 87.1755 cents.

Ran it again against o4-mini and gemini-2.5-pro-preview-05-06 just out of curiosity.

@simonw simonw changed the title Toolbox methods should be logged to DB differently and shown differently in llm logs Toolbox instantiation should be logged to DB when logging toolbox methods May 26, 2025
@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Notable that there's currently no Toolbox record in the database either - only records for individual tool method calls. Datasette doesn't get its own record, just Datasette_execute.

llm/docs/logging.md

Lines 378 to 407 in e23e13e

CREATE TABLE [tools] (
[id] INTEGER PRIMARY KEY,
[hash] TEXT,
[name] TEXT,
[description] TEXT,
[input_schema] TEXT,
[plugin] TEXT
);
CREATE TABLE [tool_responses] (
[tool_id] INTEGER REFERENCES [tools]([id]),
[response_id] TEXT REFERENCES [responses]([id]),
PRIMARY KEY ([tool_id],
[response_id])
);
CREATE TABLE [tool_calls] (
[id] INTEGER PRIMARY KEY,
[response_id] TEXT REFERENCES [responses]([id]),
[tool_id] INTEGER REFERENCES [tools]([id]),
[name] TEXT,
[arguments] TEXT,
[tool_call_id] TEXT
);
CREATE TABLE [tool_results] (
[id] INTEGER PRIMARY KEY,
[response_id] TEXT REFERENCES [responses]([id]),
[tool_id] INTEGER REFERENCES [tools]([id]),
[name] TEXT,
[output] TEXT,
[tool_call_id] TEXT
);

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

The main impact this has is that llm logs -T Datasette might not work. A hack would be to teach it to look for Datasette_* in the name column of the tools table, which would actually work because the naming convention for tool classes is not to include underscores in them.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

o4-mini's reply made me think that maybe I could have this table:

 CREATE TABLE [toolbox_config] ( 
   [id] INTEGER PRIMARY KEY, 
   [hash] TEXT, -- sha256 hash of name and config and plugin
   [name] TEXT, -- e.g. "Datasette"
   [config] TEXT, -- serialized JSON
   [plugin] TEXT 
 ); 

Then individual tool_results could record a toolbox_config_id referencing that table if they used a toolbox that had been configured (or any toolbox at all, since non-configured ones can still record {} in config).

Would this give me what I want? Could I use this to later re-inflate a Toolbox on llm -c ?

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Maybe llm.Toolbox grows a serializable = True option which users can flip to False for tools that they know cannot be serialized - and these tools will then silently not be reinflated by llm -c - you have to pass llm -c --tool X instead.

Or throw an error so it's not an invisible failure.

I'm having to do quite a lot of work to support the edge-case of llm -c against a conversation including configured toolbox tools!

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

I think I'm going to implement logging before I ship a stable release, but maybe I'll punt on supporting this until a future release:

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

I'm going to prototype up the tools_config table.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Actually, no, that is not recording enough information.

The key thing that we need to record is it a bunch of tool calls all acted on the same instance - a shared Python interpreter or JavaScript session, for example.

That is important context for understanding exactly what happened later from the logs, and is a slightly separate concern from remembering what the initial configuration arguments were.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

So maybe a tools_instance table, created with a ULID and references by individual tool call results?

That table could also record the serialized constructor arguments.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

New SQL (new tool_instances table and tool_results.instance_id column):

CREATE TABLE [tool_instances] (
   [id] TEXT PRIMARY KEY, -- I'll use ULID for this so databases can be combined later
   [plugin] TEXT,
   [name] TEXT,
   [arguments] TEXT
);
CREATE TABLE "tool_results" (
   [id] INTEGER PRIMARY KEY,
   [response_id] TEXT REFERENCES [responses]([id]),
   [tool_id] INTEGER REFERENCES [tools]([id]),
   [name] TEXT,
   [output] TEXT,
   [tool_call_id] TEXT,
   [instance_id] TEXT REFERENCES [tool_instances]([id])
);

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

I'm not 100% sure the ULID is necessary here - database combinations could be done without it, and tool_results is already using integer primary keys.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Am I OK with spinning up a brand now tool_instances row every time a tool instance is used, even if they end up with thousands of them with duplicate names and plugins and arguments?

Yeah I think so. I could de-dupe the (plugin, name, argument) thing into yet another table. I can change that some time in the future if I really want to.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

... but in that case I won't use ULID, it's just extra storage space for no reason.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Had to perform heart surgery on my database! I updated the new migration and did:

llm logs backup /tmp/logs.db

sqlite-utils "$(llm logs path)" "delete from _llm_migrations where name = 'm018_tool_instances'"
sqlite-utils transform "$(llm logs path)" tool_results --drop instance_id
sqlite-utils drop-table "$(llm logs path)" tool_instances

Then this to re-run the migration:

llm logs -c

And now:

sqlite-utils schema "$(llm logs path)"

Returns:

CREATE TABLE [tool_instances] (
   [id] INTEGER PRIMARY KEY,
   [plugin] TEXT,
   [name] TEXT,
   [arguments] TEXT
);
CREATE TABLE "tool_results" (
   [id] INTEGER PRIMARY KEY,
   [response_id] TEXT REFERENCES [responses]([id]),
   [tool_id] INTEGER REFERENCES [tools]([id]),
   [name] TEXT,
   [output] TEXT,
   [tool_call_id] TEXT,
   [instance_id] INTEGER REFERENCES [tool_instances]([id])
);

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Now I need a mechanism where by the llm.Toolbox has its initialization automatically captured when it's created such that it can be logged later on.

Maybe the llm.ToolResult objects that get captured need to reference their instance?

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Here's where llm.ToolResult() is created during a sync model execution of tools:

llm/llm/models.py

Lines 907 to 926 in e23e13e

try:
if asyncio.iscoroutinefunction(tool.implementation):
result = asyncio.run(tool.implementation(**tool_call.arguments))
else:
result = tool.implementation(**tool_call.arguments)
if not isinstance(result, str):
result = json.dumps(result, default=repr)
except Exception as ex:
result = f"Error: {ex}"
tool_result_obj = ToolResult(
name=tool_call.name,
output=result,
tool_call_id=tool_call.tool_call_id,
)
if after_call:
cb_result = after_call(tool, tool_call, tool_result_obj)
if inspect.isawaitable(cb_result):

And here's where that happens for async models - that is in two places. This is for async methods:

llm/llm/models.py

Lines 1067 to 1081 in e23e13e

try:
result = await tool.implementation(**tc.arguments)
output = (
result
if isinstance(result, str)
else json.dumps(result, default=repr)
)
except Exception as ex:
output = f"Error: {ex}"
tr = ToolResult(
name=tc.name,
output=output,
tool_call_id=tc.tool_call_id,
)

And for sync methods:

llm/llm/models.py

Lines 1100 to 1114 in e23e13e

try:
res = tool.implementation(**tc.arguments)
if inspect.isawaitable(res):
res = await res
output = (
res if isinstance(res, str) else json.dumps(res, default=repr)
)
except Exception as ex:
output = f"Error: {ex}"
tr = ToolResult(
name=tc.name,
output=output,
tool_call_id=tc.tool_call_id,
)

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

All of those places have access to tool.implementation, so they should be able to pick up the necessary information from that somehow.

@simonw
Copy link
Owner Author

simonw commented May 26, 2025

Might be as simple as adding an instance property like this:

@dataclass
class ToolResult:
    name: str
    output: str
    tool_call_id: Optional[str] = None
    instance: Optional[llm.Toolbox] = None # New

It's up to the things that create ToolResult() to populate instance if possible, then the code that writes ToolResult to the database can optionally serialize that instance as well, maybe setting a _db_id property on it if it does not yet have one.

llm/llm/models.py

Lines 836 to 845 in e23e13e

for tool_result in self.prompt.tool_results:
db["tool_results"].insert(
{
"response_id": response_id,
"tool_id": tool_ids_by_name.get(tool_result.name) or None,
"name": tool_result.name,
"output": tool_result.output,
"tool_call_id": tool_result.tool_call_id,
}
)

@simonw
Copy link
Owner Author

simonw commented May 27, 2025

Got this from Claude Opus 4: https://claude.ai/share/d7b4b6a0-9904-4663-a3fc-fdbf9c4afd1b

class Toolbox:
    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        
        original_init = cls.__init__
        
        def wrapped_init(self, *args, **kwargs):
            self._config = {
                'args': args,
                'kwargs': kwargs,
                'class': cls.__name__
            }
            original_init(self, *args, **kwargs)
        
        cls.__init__ = wrapped_init

@simonw
Copy link
Owner Author

simonw commented May 27, 2025

That seems to work nicely:

>>> from llm_tools_datasette import Datasette
>>> d = Datasette('https://datasette.io/content')
>>> d
<llm_tools_datasette.Datasette object at 0x107528a50>
>>> d._config
{'args': ('https://datasette.io/content',), 'kwargs': {}, 'class': 'Datasette'}

@simonw
Copy link
Owner Author

simonw commented May 27, 2025

Got a better version: https://claude.ai/share/e5439534-28b3-46b8-9aed-d76ee5a1985b

Now it gives me back {"url": "https://datasette.io/content"} for that example.

@simonw
Copy link
Owner Author

simonw commented May 27, 2025

Got this working:

Image

Image

@simonw
Copy link
Owner Author

simonw commented May 27, 2025

I'll finish this in a PR.

@simonw simonw closed this as completed in e4ecb86 May 27, 2025
@simonw simonw unpinned this issue May 27, 2025
simonw added a commit that referenced this issue May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant