Feature: Hooks #2057

craigwalton-dsit · 2025-06-26T17:21:25Z

This functionality will support integrating observability/logging frameworks like Weights & Biases (#2046).

Requiring careful review:

Where I'm calling the emit_... functions from. There's quite a bit of nesting, recursion, try/excepts within _eval_async_inner() and task_run_sample() so could do with a double check.

Design decisions:

Why don't we require env vars to be set (like INSPECT_API_KEY_OVERRIDE) to prevent malicious packages from exfiltrating data? There's many other ways a malicious package can exfiltrate data. They can also just set the env var themselves. We'll print out which hooks are registered at startup (though a malicious hook could defer its registration).
Why is override_api_key not async (but all other hooks are)? Because it is called from ModelAPI.__init__(). To make this async would require a bit of a rework.
Why are there a bunch of dataclasses rather than parameters on the hook functions? To protect ourselves against breaking changes when we inevitably want to pass additional data to a hook.

Future work:

Add hooks for events (e.g. tool calls, messages etc) - anything that shows up in transcript.
Alex suggested a hook which runs after all solvers but before scoring.

…(backslash) in f-strings on Python 3.10 (syntax was added in Python 3.12)".

craigwalton-dsit · 2025-06-27T13:36:46Z

src/inspect_ai/_eval/context.py

@@ -28,7 +27,6 @@ def init_eval_context(
    init_logger(log_level, log_level_transcript)
    init_concurrency()
    init_max_subprocesses(max_subprocesses)
-    init_hooks()


We now init hooks in platform_init().

craigwalton-dsit · 2025-06-27T13:49:26Z

src/inspect_ai/_util/platform.py

@@ -20,6 +20,10 @@ def running_in_notebook() -> bool:


 def platform_init() -> None:
+    from inspect_ai.hooks._startup import init_hooks
+
+    init_hooks()


It is safe to call this multiple times:

for "new" hooks there is a _registry_hooks_loaded global

for "legacy" hooks, we only try loading them (and display message if loaded) if they're not already loaded

craigwalton-dsit · 2025-06-27T14:17:57Z

src/inspect_ai/hooks/_hooks.py

+    async def on_run_start(self, data: RunStart) -> None:
+        """On run start.
+
+        A "run" is a single invocation of `eval()` or `eval_retry()` which may contain


Some of these docstrings might be worth double checking as I've made some claims about when these will be called (users will no doubt have questions otherwise).

craigwalton-dsit · 2025-06-27T14:22:37Z

src/inspect_ai/hooks/_startup.py

+        print(
+            f"[blue][bold]inspect_ai v{version}[/bold][/blue]\n"
+            f"[bright_black]{all_messages}[/bright_black]\n"
+        )


craigwalton-dsit · 2025-06-27T16:40:46Z

src/inspect_ai/hooks/_hooks.py

+        Args:
+           data: Sample end data.
+        """
+        pass


I'm unsure about whether I've picked the ideal specifics of the on_sample_* hooks:
on_sample_start: on every try (including retries)
on_sample_end: on a successful completion
on_sample_abort: when a sample errors and has no retries remaining

It feels weird that there could be a mismatch between the number of on_sample_start and on_sample_end events.

craigwalton-dsit and others added 11 commits June 26, 2025 13:09

Add hooks feature.

91662ba

doc bootstrap

88f2184

Improve comments and docs.

2ee5988

Improve docs.

60bdccd

Pass run_id and task_id, refactor creation of EvalSample.

3273d39

Remove unused type ignores.

631c476

Improve commentary/ordering.

dafbcea

Merge remote-tracking branch 'origin/main' into craig/lifecycle-hooks

958ad05

Improve tests, remove now redundant comment.

8a6c7f1

Split up f string expression to avoid "Cannot use an escape sequence …

0fbee1e

…(backslash) in f-strings on Python 3.10 (syntax was added in Python 3.12)".

Merge remote-tracking branch 'origin/main' into craig/lifecycle-hooks

895c663

craigwalton-dsit commented Jun 27, 2025

View reviewed changes

Improve docstrings for hooks.

1a9051f

craigwalton-dsit commented Jun 27, 2025

View reviewed changes

Improve comment.

5f23ec1

craigwalton-dsit commented Jun 27, 2025

View reviewed changes

Clarify when model usage will be called re caching.

f32d44b

craigwalton-dsit commented Jun 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Hooks #2057

Feature: Hooks #2057

Uh oh!

craigwalton-dsit commented Jun 26, 2025 •

edited

Loading

Uh oh!

craigwalton-dsit Jun 27, 2025

Uh oh!

craigwalton-dsit Jun 27, 2025

Uh oh!

craigwalton-dsit Jun 27, 2025

Uh oh!

craigwalton-dsit Jun 27, 2025

Uh oh!

craigwalton-dsit Jun 27, 2025

Uh oh!

Uh oh!

Feature: Hooks #2057

Are you sure you want to change the base?

Feature: Hooks #2057

Uh oh!

Conversation

craigwalton-dsit commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

craigwalton-dsit Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

craigwalton-dsit Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

craigwalton-dsit Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

craigwalton-dsit Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

craigwalton-dsit Jun 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

craigwalton-dsit commented Jun 26, 2025 •

edited

Loading