Skip to content

Feature: Hooks #2057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from
Draft

Feature: Hooks #2057

wants to merge 14 commits into from

Conversation

craigwalton-dsit
Copy link
Collaborator

@craigwalton-dsit craigwalton-dsit commented Jun 26, 2025

This functionality will support integrating observability/logging frameworks like Weights & Biases (#2046).

Requiring careful review:

  • Where I'm calling the emit_... functions from. There's quite a bit of nesting, recursion, try/excepts within _eval_async_inner() and task_run_sample() so could do with a double check.

Design decisions:

  • Why don't we require env vars to be set (like INSPECT_API_KEY_OVERRIDE) to prevent malicious packages from exfiltrating data? There's many other ways a malicious package can exfiltrate data. They can also just set the env var themselves. We'll print out which hooks are registered at startup (though a malicious hook could defer its registration).
  • Why is override_api_key not async (but all other hooks are)? Because it is called from ModelAPI.__init__(). To make this async would require a bit of a rework.
  • Why are there a bunch of dataclasses rather than parameters on the hook functions? To protect ourselves against breaking changes when we inevitably want to pass additional data to a hook.

Future work:

  • Add hooks for events (e.g. tool calls, messages etc) - anything that shows up in transcript.
  • Alex suggested a hook which runs after all solvers but before scoring.

@@ -28,7 +27,6 @@ def init_eval_context(
init_logger(log_level, log_level_transcript)
init_concurrency()
init_max_subprocesses(max_subprocesses)
init_hooks()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now init hooks in platform_init().

@@ -20,6 +20,10 @@ def running_in_notebook() -> bool:


def platform_init() -> None:
from inspect_ai.hooks._startup import init_hooks

init_hooks()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is safe to call this multiple times:

  • for "new" hooks there is a _registry_hooks_loaded global
  • for "legacy" hooks, we only try loading them (and display message if loaded) if they're not already loaded

async def on_run_start(self, data: RunStart) -> None:
"""On run start.

A "run" is a single invocation of `eval()` or `eval_retry()` which may contain
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these docstrings might be worth double checking as I've made some claims about when these will be called (users will no doubt have questions otherwise).

print(
f"[blue][bold]inspect_ai v{version}[/bold][/blue]\n"
f"[bright_black]{all_messages}[/bright_black]\n"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-06-27 at 15 22 05

Args:
data: Sample end data.
"""
pass
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure about whether I've picked the ideal specifics of the on_sample_* hooks:
on_sample_start: on every try (including retries)
on_sample_end: on a successful completion
on_sample_abort: when a sample errors and has no retries remaining

It feels weird that there could be a mismatch between the number of on_sample_start and on_sample_end events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants