Add Task Submission / Return Hooks #1287

cswartzvi · 2025-03-02T20:18:44Z

Current pre/post task execution hooks may run on out-of-process executors potentially causing issues with certain logging and stateful adapters. This PR adds additional hooks that are run before a task is submitted to an executor and after a task future is resolved - both on the main process.

Changes

I added task submission / resolution hooks (BasePreTaskSubmission and BasePostTaskResolution) and adapters (TaskSubmissionHook and TaskResolutionHook) to the codebase as well as updated the main task execution function run_graph_to_completion to call the new hooks. In support of this I made TaskImplementation hashable via it's task_id in order to avoid a parallel structure - @elijahbenizzy perhaps in the future this could be used to a queue-based task processor?

How I tested this

I added individual tests for both TaskSubmissionHook and TaskResolutionHook and added them to test_multi_hook.

Notes

The docs for TaskSubmissionHook and TaskResolutionHook, as well some other task execution hooks, have been updated. I think we could add some more details - maybe a flow chart of when the individual hooks are called? I can add that if the current changes seems reasonable.
TaskResolutionHook currently has an optional error attribute. I am on the fence as to whether or not this is needed. My thinking is that since TaskFuture essentially wraps concurrent.futures.Future we could forward the exception call. Thoughts? Currently error is always None.

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 2e31305 in 3 minutes and 1 seconds

More details

Looked at 647 lines of code in 9 files
Skipped 0 files when reviewing.
Skipped posting 19 drafted comments based on config settings.

1. hamilton/lifecycle/api.py:376

Draft comment:
This class is functionally identical to BasePreTaskSubmission - it just adds an unnecessary layer of indirection. Please use BasePreTaskSubmission directly instead.
class BasePreTaskSubmission (base.py)
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50%
The TaskSubmissionHook class:

Renames pre_task_submission to run_before_task_submission
Provides a concrete implementation of pre_task_submission that delegates to run_before_task_submission
Adds docstrings
Otherwise has identical functionality to BasePreTaskSubmission
This pattern is consistent with other hooks in the file that provide a more user-friendly API over the base classes.
The comment ignores that this pattern of concrete delegation methods with renamed abstract methods is used consistently throughout the codebase for all hooks. The pattern likely serves a purpose in providing a more intuitive API.
The consistent pattern across the codebase suggests this is an intentional design choice to provide a more user-friendly API layer over the base hooks, not unnecessary indirection.
The comment should be deleted as it mischaracterizes an intentional API design pattern as unnecessary indirection.

2. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:85

Draft comment:
Avoid using hardcoded magic numbers (here 14) for the expected number of hook calls. Consider adding inline comments or deriving the expected count from the task structure so tests are less brittle.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

3. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:121

Draft comment:
The run_id length assertion compares against the length of a TODO string. Replace it with a meaningful fixed threshold or check for valid UUID format.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:211

Draft comment:
Magic arithmetic for expected hook call counts (2 + 5 * 2 + 1 + 1) is unclear. Consider declaring expected counts as clearly named constants or adding clarifying comments.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:448

Draft comment:
In test_multi_hook, the expected hookCounts dictionary uses many literal numbers. Document the reasoning behind each count to improve test readability and maintainability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

6. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:250

Draft comment:
The use of counters to check spawning_task_ids and purposes is fine, but consider adding inline comments explaining how the expected distribution (e.g. {"expand-parallel_over": 5, None: 5}) corresponds to the execution plan.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50%
None

7. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:80

Draft comment:
Add inline comments explaining how the expected count of 14 for 'pre_node_execute' is derived from the DAG structure. This will aid future maintainers in understanding the arithmetic behind the hook call counts.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:211

Draft comment:
The final result is asserted to be 426; please add a clarifying comment explaining the arithmetic behind this expected output so that future readers can follow the transformation logic.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

9. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:249

Draft comment:
Document the rationale behind the expected counts for spawning task IDs and purpose counters (e.g., why 5 for 'expand-parallel_over' and 3 for EXECUTE_SINGLE). A brief comment here would improve clarity.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50%
While documentation could help explain the test's expectations, this is a test file and the assertions are verifying implementation details. The values come from the specific execution flow being tested - they're not arbitrary magic numbers but rather the expected behavior of the system. The test name and context make it clear what's being tested. Adding comments about implementation details could make the tests more brittle if the underlying implementation changes.
The comment raises a valid point that these numbers may seem magical to someone unfamiliar with the system. Understanding why we expect these specific counts could help maintainers debug test failures.
However, tests should be self-documenting through clear names and structure. These counts are implementation details that may change if the execution flow changes. Adding documentation about specific counts makes tests more fragile and harder to maintain.
The comment should be deleted. The test is clear about what it's testing through its name and structure. Documenting specific assertion values would create maintenance burden and make tests more brittle.

10. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:242

Draft comment:
The test for the post task expand hook asserts that the parameters are exactly ["0", "1", "2", "3", "4"]. Please confirm that it is intentional for these values to be strings rather than integers, and add a comment if the conversion is deliberate.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

11. tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py:444

Draft comment:
In the 'test_multi_hook' test, the hook count assertion uses a Counter over function names. Consider refactoring this check or adding detailed inline comments that document the expected call counts for each hook for enhanced clarity and future maintainability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

12. tests/lifecycle/lifecycle_adapters_for_testing.py:54

Draft comment:
The ExtendToTrackCalls wrapper iterates over lifecycle hook and method attributes to wrap them for call tracking. It would be helpful to add inline documentation here describing the wrapping mechanism and its intended behavior for future maintainers.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

13. hamilton/execution/executors.py:98

Draft comment:
Typo: In the docstring for base_execute_task, 'diciontary' should be corrected to 'dictionary'.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

14. hamilton/execution/executors.py:102

Draft comment:
Typo: In the comment within base_execute_task (around line 102), consider changing "its an implementation detail" to "it's an implementation detail" for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

15. hamilton/execution/executors.py:74

Draft comment:
Typo: In the _modify_callable function definition, consider renaming the parameter 'callabl' to 'callable' for clarity, unless this naming is intentional to avoid conflict with the built-in 'callable' function.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

16. hamilton/execution/grouping.py:203

Draft comment:
Typo found: 'Fina' should be 'Find' in the docstring.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

17. hamilton/execution/grouping.py:233

Draft comment:
Typo found: 'thie' should be 'this' in the comment.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

18. hamilton/execution/grouping.py:393

Draft comment:
Typo found: 'guarenteed' should be 'guaranteed' in the comment.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

19. hamilton/lifecycle/api.py:767

Draft comment:
Typo in the docstring for run_after_task_grouping: consider removing 'the' to change 'information about the which tasks were created' to 'information about which tasks were created'.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_NREw27CKOdDAEWIw

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

hamilton/lifecycle/api.py

hamilton/lifecycle/base.py

cswartzvi · 2025-03-02T20:23:39Z

Ahh - I used str | None instead of Optional[str]

cswartzvi · 2025-03-02T21:01:23Z

Regarding the failing doc test - possibly related to ddtrace 3.1.0 update, in particular DataDog/dd-trace-py#12186

See #1288

skrawcz · 2025-03-03T18:15:22Z

What's the impact here to the Hamilton Tracker? Should we implement these new functions for it?

cswartzvi · 2025-03-03T19:12:24Z

@skrawcz,

I am not that familiar with the HamiltonTracker, but it appears to implement, mainly, node-based hooks. This change should not alter their function - node-based hooks should run as before.

Offhand, do you know if the UI has any issues tracking out-of-process execution? I see that HamiltonTracker is stateful and I don't see any evident __getstate__ or __setstate__ methods, but I could be missing something. If there are issues, these hooks might be of some use - I would need to experiment.

Edit: Sorry, that's probably not super helpful.

skrawcz · 2025-03-04T04:25:14Z

I am not that familiar with the HamiltonTracker, but it appears to implement, mainly, node-based hooks. This change should not alter their function - node-based hooks should run as before.

👍

Offhand, do you know if the UI has any issues tracking out-of-process execution? I see that HamiltonTracker is stateful and I don't see any evident __getstate__ or __setstate__ methods, but I could be missing something. If there are issues, these hooks might be of some use - I would need to experiment.

Yes, that's in the client that sends the events - need to go one level deeper in the tracker to the client it uses.

elijahbenizzy · 2025-03-06T04:57:11Z

I am not that familiar with the HamiltonTracker, but it appears to implement, mainly, node-based hooks. This change should not alter their function - node-based hooks should run as before.

👍

Offhand, do you know if the UI has any issues tracking out-of-process execution? I see that HamiltonTracker is stateful and I don't see any evident __getstate__ or __setstate__ methods, but I could be missing something. If there are issues, these hooks might be of some use - I would need to experiment.

Yes, that's in the client that sends the events - need to go one level deeper in the tracker to the client it uses.

To chime in -- I don't think this should ipmact the hamilton tracker. That said, we may want to consider adding more to it to leverage these -- we could get a more fine-grained view/get expose more descriptive states.

skrawcz · 2025-03-06T06:34:00Z

Cool. Does it makes sense to provide some implementation of these hooks in this PR too? e.g. some example using it?

cswartzvi · 2025-03-06T18:12:46Z

@skrawcz - I was actually planning to create a second PR with a dedicated logging adapter (based on a side conversation with @elijahbenizzy) using these new hooks. I can add that here if you'd like.

skrawcz · 2025-03-07T16:55:42Z

@skrawcz - I was actually planning to create a second PR with a dedicated logging adapter (based on a side conversation with @elijahbenizzy) using these new hooks. I can add that here if you'd like.

Cool. Yeah I say having an implementation would be good because having something implemented helps prove / ground the API :) It can be in a PR off of this branch or together.

cswartzvi · 2025-03-11T14:40:41Z

So, the logging adapter is a bit more expansive than I originally thought. We will probably want to have a discussion. I will open a separate PR based off this branch - hopefully this ~~evening~~ week.

Edit: Changed ETA. Sorry! 😞

elijahbenizzy

Looking good -- a few minor comments

hamilton/execution/executors.py

tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py

cswartzvi · 2025-03-17T03:07:55Z

Summary of the latest changes (I have been sitting on them)...

Made TaskFuture a protocol, added a TaskFutureWrappingFunction implementation - this was done so that errors could be handled correctly from synchronous tasks
Cleaned up some of the exception handling at submission and resolution calling site
Various other quality of life improvements

cswartzvi · 2025-03-17T04:21:19Z

Hmm ... tests are failing because of xgboost. Not sure if this was something I broke.

elijahbenizzy

Looks good! A few small changes but I think it's almost there.

hamilton/execution/executors.py

hamilton/lifecycle/api.py

elijahbenizzy

Thanks! Sorry this took a bit to review. Looks good, just name change the hook and ping me, I'll merge.

cswartzvi · 2025-03-22T15:28:38Z

@elijahbenizzy I changed the name of the hook and merged in main (probably should have rebased).

cswartzvi added 6 commits February 24, 2025 17:53

Add hooks for task submission and resolution

6ae1aa2

Make TaskImplementation hashable by task ID

13b8f98

Add calls to task submission and resolution hooks

fda3f64

Add tests for task submission and resolution hooks

d6df6e4

Add docs for task submission and resolution hooks

9892959

Update comment 'FIXME' -> 'TODO'

2e31305

ellipsis-dev bot reviewed Mar 2, 2025

View reviewed changes

hamilton/lifecycle/api.py Outdated Show resolved Hide resolved

hamilton/lifecycle/base.py Outdated Show resolved Hide resolved

cswartzvi added 2 commits March 2, 2025 15:27

Convert str | None to Optional[str]

7e9b70b

Fix docstring typos

4efee2c

Merge branch 'DAGWorks-Inc:main' into task_submission_hook

e5d0086

cswartzvi added 3 commits March 14, 2025 22:17

Make TaskFuture a protocol, add TaskFutureWrappingFunction

a19de6e

Add better exception handling; clean up code

e9b7a4e

Add imports for task and submission hooks

a8b042a

elijahbenizzy reviewed Mar 15, 2025

View reviewed changes

hamilton/execution/executors.py Show resolved Hide resolved

tests/lifecycle/test_lifecycle_adapters_end_to_end_task_based.py Show resolved Hide resolved

cswartzvi added 3 commits March 16, 2025 22:37

Make sure finally block raises exception

627bf1b

Add comments regarding task counts

c29eb1a

Add submission and resolution hooks to LifecycleAdapter union

d9754b8

cswartzvi mentioned this pull request Mar 17, 2025

Add Context-Aware Synchronous/Asynchronous Logging Adapters #1294

Merged

7 tasks

elijahbenizzy reviewed Mar 20, 2025

View reviewed changes

hamilton/execution/executors.py Outdated Show resolved Hide resolved

hamilton/execution/executors.py Show resolved Hide resolved

hamilton/execution/executors.py Show resolved Hide resolved

hamilton/lifecycle/api.py Outdated Show resolved Hide resolved

Check cache in TaskFutureWrappingFunction.get_state

87e8743

cswartzvi mentioned this pull request Mar 21, 2025

Fix xgboost errors in CI #1295

Merged

7 tasks

elijahbenizzy approved these changes Mar 21, 2025

View reviewed changes

cswartzvi added 2 commits March 21, 2025 20:15

Merge branch 'DAGWorks-Inc:main' into task_submission_hook

5af0ded

Rename 'task resolution' to 'task return'

a5ba413

cswartzvi changed the title ~~Add Task Submission / Resolution Hooks~~ Add Task Submission / Return Hooks Mar 22, 2025

elijahbenizzy merged commit 06e1c20 into apache:main Mar 29, 2025
24 checks passed

Add Task Submission / Return Hooks #1287

Add Task Submission / Return Hooks #1287

Uh oh!

Conversation

cswartzvi commented Mar 2, 2025

Changes

How I tested this

Notes

Checklist

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cswartzvi commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cswartzvi commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skrawcz commented Mar 3, 2025

Uh oh!

cswartzvi commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skrawcz commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elijahbenizzy commented Mar 6, 2025

Uh oh!

skrawcz commented Mar 6, 2025

Uh oh!

cswartzvi commented Mar 6, 2025

Uh oh!

skrawcz commented Mar 7, 2025

Uh oh!

cswartzvi commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cswartzvi commented Mar 17, 2025

Uh oh!

cswartzvi commented Mar 17, 2025

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elijahbenizzy left a comment

Choose a reason for hiding this comment

Uh oh!

cswartzvi commented Mar 22, 2025

Uh oh!

Uh oh!

Uh oh!

cswartzvi commented Mar 2, 2025 •

edited

Loading

cswartzvi commented Mar 2, 2025 •

edited

Loading

cswartzvi commented Mar 3, 2025 •

edited

Loading

skrawcz commented Mar 4, 2025 •

edited

Loading

cswartzvi commented Mar 11, 2025 •

edited

Loading