Toolbox instantiation should be logged to DB when logging toolbox methods #1089

simonw · 2025-05-26T05:09:37Z

Split from:

Also refs:

New design for logs involving tool calls (and maybe tool classes) #1080

simonw · 2025-05-26T17:28:13Z

The key information that needs to be stored in the database here is the configuration that was used to instantiate the Toolbox (for tools that take configuration).

I should probably enforce somewhere that this configuration must be JSON serializable - but is that strictly necessary for people who are using tools exclusively in Python and not via the CLI?

What should the database logging stuff do if a user attempts to serialize a response that used a class instance that didn't use serializable configuration? Cheapest trick would be to do json.dumps(..., default=repr) and maybe that's enough.

Then later what happens if you try to re-inflate a conversation from the database that has invalid serialized configuration? I guess you get an error.

simonw · 2025-05-26T17:30:30Z

LLM may need to know if a Toolbox has stayed resident in memory or if it's been discarded and re-inflated. That's material to how tool calling work, otherwise you confuse the models by occasionally clearing state on them (though I've seen Code Interpreter and tools like it handle this OK in the past, they generally decide to try and reconstruct their state.)

simonw · 2025-05-26T17:33:16Z

Let's use https://github.com/simonw/llm-tools-datasette as an example. It currently has one argument:

llm --tool 'Datasette("https://datasette.io/content")' "Show tables" --td

I'm contemplating adding a second optional one for authentication (a key to lookup in `llm.get_key()):

Authentication mechanism llm-tools-datasette#2

If someone runs the above llm --tool prompt what would I need to store in the DB and where would I put it?

If the tool class is instantiated but no methods on it are called should we still record that we instantiated it? The logs need to know "the model had the option to use these tools and chose not to". I think the configuration of the class IS relevant to that, since maybe the model decided not to use tools based on how they had been configured (but can it see that?)

simonw · 2025-05-26T17:41:17Z

Part of the challenge here is where this happens.

CLI users have their toolboxes instantiated for them here:

llm/llm/cli.py

Line 807 in e23e13e

tool_implementations = _gather_tools(tools, python_tools)

Or here in llm chat:

llm/llm/cli.py

Line 1091 in e23e13e

tool_functions = _gather_tools(tools, python_tools)

But Python API users instantiate tools directly themselves:

import llm
from llm_tools_datasette import Datasette

model = llm.get_model("gpt-4.1-mini")

result = model.chain(
    "Show most interesting tables",
    tools=[Datasette("https://datasette.io/content")],
).text()

How do we record the "this class was instantiated at this moment and with these arguments" information such that, later on, this code in Response.log_to_db() can see that information and put it somewhere useful?

llm/llm/models.py

Lines 815 to 845 in e23e13e

    
           # Persist any tools, tool calls and tool results 
        
           tool_ids_by_name = {} 
        
           for tool in self.prompt.tools: 
        
               tool_id = ensure_tool(db, tool) 
        
               tool_ids_by_name[tool.name] = tool_id 
        
               db["tool_responses"].insert( 
        
                   { 
        
                       "tool_id": tool_id, 
        
                       "response_id": response_id, 
        
                   } 
        
               ) 
        
           for tool_call in self.tool_calls():  # TODO Should  be _or_raise() 
        
               db["tool_calls"].insert( 
        
                   { 
        
                       "response_id": response_id, 
        
                       "tool_id": tool_ids_by_name.get(tool_call.name) or None, 
        
                       "name": tool_call.name, 
        
                       "arguments": json.dumps(tool_call.arguments), 
        
                       "tool_call_id": tool_call.tool_call_id, 
        
                   } 
        
               ) 
        
           for tool_result in self.prompt.tool_results: 
        
               db["tool_results"].insert( 
        
                   { 
        
                       "response_id": response_id, 
        
                       "tool_id": tool_ids_by_name.get(tool_result.name) or None, 
        
                       "name": tool_result.name, 
        
                       "output": tool_result.output, 
        
                       "tool_call_id": tool_result.tool_call_id, 
        
                   } 
        
               )

simonw · 2025-05-26T17:42:20Z

I'm inclined to have the llm.Toolbox subclass do some magic here, somehow recording its moment of instantiation such that later on code with access to self.prompt.tools can see what it recorded.

simonw · 2025-05-26T17:50:10Z

llm -m claude-4-opus -f issue:https://github.com/simonw/llm/issues/1089 \
  -f llm/models.py -f llm/cli.py \
  -s 'Suggest potential code approaches for this issue'

https://gist.github.com/simonw/7b42c18a87f7caed5d5016a14f4e7c99#response - not hugely useful. 87.1755 cents.

Ran it again against o4-mini and gemini-2.5-pro-preview-05-06 just out of curiosity.

o4-mini suggested a metaclass, actually looks very solid: https://gist.github.com/simonw/628adb24c704cf99cb6be534ed0dc3a1 5.4579 cents
gemini-2.5-pro-preview-05-06 suggested something less magic - "Introduce a protected method, e.g., _record_init_config(self, **config_kwargs). Toolbox subclasses will be responsible for calling this method from their __init__, passing their specific configuration arguments." https://gist.github.com/simonw/7c31661448b42f7f6942893e8c3cbbeb 22.9575 cents

simonw · 2025-05-26T17:52:50Z

Notable that there's currently no Toolbox record in the database either - only records for individual tool method calls. Datasette doesn't get its own record, just Datasette_execute.

llm/docs/logging.md

Lines 378 to 407 in e23e13e

    
           CREATE TABLE [tools] ( 
        
             [id] INTEGER PRIMARY KEY, 
        
             [hash] TEXT, 
        
             [name] TEXT, 
        
             [description] TEXT, 
        
             [input_schema] TEXT, 
        
             [plugin] TEXT 
        
           ); 
        
           CREATE TABLE [tool_responses] ( 
        
             [tool_id] INTEGER REFERENCES [tools]([id]), 
        
             [response_id] TEXT REFERENCES [responses]([id]), 
        
             PRIMARY KEY ([tool_id], 
        
             [response_id]) 
        
           ); 
        
           CREATE TABLE [tool_calls] ( 
        
             [id] INTEGER PRIMARY KEY, 
        
             [response_id] TEXT REFERENCES [responses]([id]), 
        
             [tool_id] INTEGER REFERENCES [tools]([id]), 
        
             [name] TEXT, 
        
             [arguments] TEXT, 
        
             [tool_call_id] TEXT 
        
           ); 
        
           CREATE TABLE [tool_results] ( 
        
             [id] INTEGER PRIMARY KEY, 
        
             [response_id] TEXT REFERENCES [responses]([id]), 
        
             [tool_id] INTEGER REFERENCES [tools]([id]), 
        
             [name] TEXT, 
        
             [output] TEXT, 
        
             [tool_call_id] TEXT 
        
           );

simonw · 2025-05-26T17:54:21Z

The main impact this has is that llm logs -T Datasette might not work. A hack would be to teach it to look for Datasette_* in the name column of the tools table, which would actually work because the naming convention for tool classes is not to include underscores in them.

simonw · 2025-05-26T17:58:25Z

o4-mini's reply made me think that maybe I could have this table:

 CREATE TABLE [toolbox_config] ( 
   [id] INTEGER PRIMARY KEY, 
   [hash] TEXT, -- sha256 hash of name and config and plugin
   [name] TEXT, -- e.g. "Datasette"
   [config] TEXT, -- serialized JSON
   [plugin] TEXT 
 );

Then individual tool_results could record a toolbox_config_id referencing that table if they used a toolbox that had been configured (or any toolbox at all, since non-configured ones can still record {} in config).

Would this give me what I want? Could I use this to later re-inflate a Toolbox on llm -c ?

simonw · 2025-05-26T17:59:10Z

Maybe llm.Toolbox grows a serializable = True option which users can flip to False for tools that they know cannot be serialized - and these tools will then silently not be reinflated by llm -c - you have to pass llm -c --tool X instead.

Or throw an error so it's not an invisible failure.

I'm having to do quite a lot of work to support the edge-case of llm -c against a conversation including configured toolbox tools!

simonw · 2025-05-26T18:00:16Z

I think I'm going to implement logging before I ship a stable release, but maybe I'll punt on supporting this until a future release:

llm -c should re-add Toolbox tools with serialized configuration #1092

simonw · 2025-05-26T18:02:10Z

I'm going to prototype up the tools_config table.

simonw · 2025-05-26T18:06:55Z

Actually, no, that is not recording enough information.

The key thing that we need to record is it a bunch of tool calls all acted on the same instance - a shared Python interpreter or JavaScript session, for example.

That is important context for understanding exactly what happened later from the logs, and is a slightly separate concern from remembering what the initial configuration arguments were.

simonw · 2025-05-26T18:07:24Z

So maybe a tools_instance table, created with a ULID and references by individual tool call results?

That table could also record the serialized constructor arguments.

simonw · 2025-05-26T19:31:40Z

New SQL (new tool_instances table and tool_results.instance_id column):

CREATE TABLE [tool_instances] (
   [id] TEXT PRIMARY KEY, -- I'll use ULID for this so databases can be combined later
   [plugin] TEXT,
   [name] TEXT,
   [arguments] TEXT
);
CREATE TABLE "tool_results" (
   [id] INTEGER PRIMARY KEY,
   [response_id] TEXT REFERENCES [responses]([id]),
   [tool_id] INTEGER REFERENCES [tools]([id]),
   [name] TEXT,
   [output] TEXT,
   [tool_call_id] TEXT,
   [instance_id] TEXT REFERENCES [tool_instances]([id])
);

simonw · 2025-05-26T19:33:27Z

I'm not 100% sure the ULID is necessary here - database combinations could be done without it, and tool_results is already using integer primary keys.

simonw · 2025-05-26T19:34:19Z

Am I OK with spinning up a brand now tool_instances row every time a tool instance is used, even if they end up with thousands of them with duplicate names and plugins and arguments?

Yeah I think so. I could de-dupe the (plugin, name, argument) thing into yet another table. I can change that some time in the future if I really want to.

simonw · 2025-05-26T19:34:46Z

... but in that case I won't use ULID, it's just extra storage space for no reason.

simonw · 2025-05-26T19:39:18Z

Had to perform heart surgery on my database! I updated the new migration and did:

llm logs backup /tmp/logs.db

sqlite-utils "$(llm logs path)" "delete from _llm_migrations where name = 'm018_tool_instances'"
sqlite-utils transform "$(llm logs path)" tool_results --drop instance_id
sqlite-utils drop-table "$(llm logs path)" tool_instances

Then this to re-run the migration:

llm logs -c

And now:

sqlite-utils schema "$(llm logs path)"

Returns:

CREATE TABLE [tool_instances] (
   [id] INTEGER PRIMARY KEY,
   [plugin] TEXT,
   [name] TEXT,
   [arguments] TEXT
);
CREATE TABLE "tool_results" (
   [id] INTEGER PRIMARY KEY,
   [response_id] TEXT REFERENCES [responses]([id]),
   [tool_id] INTEGER REFERENCES [tools]([id]),
   [name] TEXT,
   [output] TEXT,
   [tool_call_id] TEXT,
   [instance_id] INTEGER REFERENCES [tool_instances]([id])
);

simonw · 2025-05-26T19:40:07Z

Now I need a mechanism where by the llm.Toolbox has its initialization automatically captured when it's created such that it can be logged later on.

Maybe the llm.ToolResult objects that get captured need to reference their instance?

simonw · 2025-05-26T19:43:28Z

Here's where llm.ToolResult() is created during a sync model execution of tools:

llm/llm/models.py

Lines 907 to 926 in e23e13e

    
           try: 
        
               if asyncio.iscoroutinefunction(tool.implementation): 
        
                   result = asyncio.run(tool.implementation(**tool_call.arguments)) 
        
               else: 
        
                   result = tool.implementation(**tool_call.arguments) 
        
               if not isinstance(result, str): 
        
                   result = json.dumps(result, default=repr) 
        
           except Exception as ex: 
        
               result = f"Error: {ex}" 
        
           tool_result_obj = ToolResult( 
        
               name=tool_call.name, 
        
               output=result, 
        
               tool_call_id=tool_call.tool_call_id, 
        
           ) 
        
           if after_call: 
        
               cb_result = after_call(tool, tool_call, tool_result_obj) 
        
               if inspect.isawaitable(cb_result):

And here's where that happens for async models - that is in two places. This is for async methods:

llm/llm/models.py

Lines 1067 to 1081 in e23e13e

    
           try: 
        
               result = await tool.implementation(**tc.arguments) 
        
               output = ( 
        
                   result 
        
                   if isinstance(result, str) 
        
                   else json.dumps(result, default=repr) 
        
               ) 
        
           except Exception as ex: 
        
               output = f"Error: {ex}" 
        
           tr = ToolResult( 
        
               name=tc.name, 
        
               output=output, 
        
               tool_call_id=tc.tool_call_id, 
        
           )

And for sync methods:

llm/llm/models.py

Lines 1100 to 1114 in e23e13e

    
           try: 
        
               res = tool.implementation(**tc.arguments) 
        
               if inspect.isawaitable(res): 
        
                   res = await res 
        
               output = ( 
        
                   res if isinstance(res, str) else json.dumps(res, default=repr) 
        
               ) 
        
           except Exception as ex: 
        
               output = f"Error: {ex}" 
        
           tr = ToolResult( 
        
               name=tc.name, 
        
               output=output, 
        
               tool_call_id=tc.tool_call_id, 
        
           )

simonw · 2025-05-26T19:44:50Z

All of those places have access to tool.implementation, so they should be able to pick up the necessary information from that somehow.

simonw · 2025-05-26T19:46:59Z

Might be as simple as adding an instance property like this:

@dataclass
class ToolResult:
    name: str
    output: str
    tool_call_id: Optional[str] = None
    instance: Optional[llm.Toolbox] = None # New

It's up to the things that create ToolResult() to populate instance if possible, then the code that writes ToolResult to the database can optionally serialize that instance as well, maybe setting a _db_id property on it if it does not yet have one.

llm/llm/models.py

Lines 836 to 845 in e23e13e

    
           for tool_result in self.prompt.tool_results: 
        
               db["tool_results"].insert( 
        
                   { 
        
                       "response_id": response_id, 
        
                       "tool_id": tool_ids_by_name.get(tool_result.name) or None, 
        
                       "name": tool_result.name, 
        
                       "output": tool_result.output, 
        
                       "tool_call_id": tool_result.tool_call_id, 
        
                   } 
        
               )

simonw · 2025-05-27T01:11:13Z

Got this from Claude Opus 4: https://claude.ai/share/d7b4b6a0-9904-4663-a3fc-fdbf9c4afd1b

class Toolbox:
    def __init_subclass__(cls, **kwargs):
        super().__init_subclass__(**kwargs)
        
        original_init = cls.__init__
        
        def wrapped_init(self, *args, **kwargs):
            self._config = {
                'args': args,
                'kwargs': kwargs,
                'class': cls.__name__
            }
            original_init(self, *args, **kwargs)
        
        cls.__init__ = wrapped_init

simonw · 2025-05-27T01:17:43Z

That seems to work nicely:

>>> from llm_tools_datasette import Datasette
>>> d = Datasette('https://datasette.io/content')
>>> d
<llm_tools_datasette.Datasette object at 0x107528a50>
>>> d._config
{'args': ('https://datasette.io/content',), 'kwargs': {}, 'class': 'Datasette'}

simonw · 2025-05-27T01:21:25Z

Got a better version: https://claude.ai/share/e5439534-28b3-46b8-9aed-d76ee5a1985b

Now it gives me back {"url": "https://datasette.io/content"} for that example.

simonw · 2025-05-27T01:46:32Z

Got this working:

simonw · 2025-05-27T01:47:20Z

I'll finish this in a PR.

Refs #1000, #1089, #1094, #1096, #1097, #1099, #1103, #1066

simonw added tools logging labels May 26, 2025

simonw added this to the LLM tools non-alpha release (0.27) milestone May 26, 2025

This was referenced May 26, 2025

Cut scope and release 0.26 (non-alpha) with tools #1066

Closed

Pattern for having Toolbox instances use secrets without passing them in logged configuration #1093

Closed

simonw changed the title ~~Toolbox methods should be logged to DB differently and shown differently in llm logs~~ Toolbox instantiation should be logged to DB when logging toolbox methods May 26, 2025

simonw mentioned this issue May 26, 2025

llm -c should re-add Toolbox tools with serialized configuration #1092

Open

simonw pinned this issue May 26, 2025

simonw mentioned this issue May 26, 2025

Files tool (toolbox) plugin #1095

Open

simonw added a commit that referenced this issue May 27, 2025

Log tool_instances to database, refs #1089

f4deca8

simonw mentioned this issue May 27, 2025

Log tool_instances to database #1098

Merged

4 tasks

simonw closed this as completed in e4ecb86 May 27, 2025

simonw unpinned this issue May 27, 2025

simonw added a commit that referenced this issue May 27, 2025

Show tool_instances on logging.md, refs #1089

eddcf48

simonw added a commit that referenced this issue May 27, 2025

Release 0.26

c6b2f26

Refs #1000, #1089, #1094, #1096, #1097, #1099, #1103, #1066

Uh oh!

Toolbox instantiation should be logged to DB when logging toolbox methods #1089

Toolbox instantiation should be logged to DB when logging toolbox methods #1089

Comments

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 26, 2025

Uh oh!

simonw commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonw commented May 27, 2025

Uh oh!

simonw commented May 27, 2025

Uh oh!

simonw commented May 27, 2025

Uh oh!

simonw commented May 27, 2025

Uh oh!

simonw commented May 27, 2025

Uh oh!

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading

simonw commented May 26, 2025 •

edited

Loading