Skip to content

Add new hook spec for better integration model for plugins with ParallelRunners #4769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented May 23, 2025

Description

Related to: #4692

Kedro-Viz PR: kedro-org/kedro-viz#2336

This PR introduces enhancements to the ParallelRunner, to provide a better mechanism for plugins and hooks that need to operate in a multi-process environment. These changes are aim to enabling plugins like kedro-viz to correctly collect data (e.g., dataset statistics) when pipelines are executed in parallel.

Overview:

  • Enable Shared State for Hooks: Hooks should now maintain and update shared state across multiple processes spawned by ParallelRunner.
  • Better Hook Execution in Parallel: Provides a better mechanism for executing hooks in subprocesses by ensuring they are picklable and properly initialised.
  • Centralised Control: ParallelRunner now has more control over the hook environment in its subprocesses.
  • Plugin Extensibility: Offers a better integration path for plugin developers whose tools need to interact with ParallelRunner.

Development notes

Introduced two new hook specifications:

  • on_parallel_runner_start(manager: SyncManager, catalog: CatalogProtocol): Allows hooks to be notified when ParallelRunner initialises. It provides access to the runner's multiprocessing.SyncManager (for creating shared state like dictionaries or lists that are safe for inter-process communication) and the main DataCatalog.
  • get_picklable_hook_implementations_for_subprocess(): Enables hooks to provide a "picklable" version of themselves that ParallelRunner can safely send to and register within its worker subprocesses. This is crucial because PluginManager instances and many complex hook objects are not inherently picklable.

ParallelRunner Enhancements:

  • When _run is called, ParallelRunner now triggers the on_parallel_runner_start hook, making its SyncManager and catalog available to interested plugins. This allows plugins to set up any necessary shared data structures before task execution begins.

_prepare_subprocess_hook_manager, has been added. This method:

  • Calls the get_picklable_hook_implementations_for_subprocess hook on the main process's PluginManager.
  • Collects any picklable hook instances provided by plugins.
  • Creates a new, simplified PluginManager (with tracing disabled to ensure picklability) for use in subprocesses and registers these picklable hooks.
  • This prepared PluginManager (or a _NullPluginManager if no picklable hooks are found) is then passed to each Task executed by the ParallelRunner.

This replaces the previous mechanism where Task instances attempted to re-initialise a PluginManager and re-register hooks independently within each subprocess.

Task Simplification (kedro/runner/task.py):

  • Removed the _run_node_synchronization and _bootstrap_subprocess static methods, along with the parallel attribute and associated logic in execute().

The responsibility of providing a hook manager to tasks running in parallel now lies solely in ParallelRunner.
If a Task receives no hook_manager (which should mainly happen if ParallelRunner provides a _NullPluginManager), it logs a warning and proceeds with a _NullPluginManager, so node execution is not blocked.

Hook Manager Creation (kedro/framework/hooks/manager.py):

  • _create_hook_manager now accepts an enable_tracing argument. This is used by ParallelRunner to create a picklable hook manager for subprocesses by disabling tracing.
  • RunnerSpecs is now added to the list of specifications when a hook manager is created.

Minor Changes:

The __del__ method now checks if _manager exists before calling shutdown.

Impact:

  • This is a feature enhancement and there is no regressions for runners like SequentialRunner.
  • Plugins wishing to use shared state with ParallelRunner will need to implement the new RunnerSpecs hooks.
  • The way hooks are managed within ParallelRunner subprocesses is fundamentally changed, moving away from per-task re-initialisation to a centrally prepared, picklable set of hooks.

How to test:

  1. Check out the Kedro develop branch with these PR changes and check out the corresponding kedro-viz branch that uses these new Kedro hooks (the one where DatasetStatsHook implements on_parallel_runner_start and get_picklable_hook_implementations_for_subprocess) and install them.
  2. Use a Kedro project (like the spaceflights starter)
  3. Ensure kedro-viz is installed and its DatasetStatsHook is active.
  4. Run the pipeline using ParallelRunner: kedro run --runner=ParallelRunner
  5. The pipeline should complete without errors related to hook management, SyncManager, or pickling of hooks/managers.
  6. After the run, check the stats.json is filled for datasets processed by different parallel workers.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

@astrojuanlu
Copy link
Member

Since this has been mentioned in the context of kedro-org/kedro-viz#2310, would like to flag a couple of things:

@/dAnjou: can plugins be executed concurrently somehow?
@/RonnyPfannschmidt (pluggy author): Currently it's not sanely possible

And there haven't been any significant updates. Maybe the trigger of the hooks should be at a higher level so that we never have to even distribute the plugin or hook manager to the subprocesses or threads?

  • Besides from my question above, I know this is a draft but how would users with custom runners make theirs compatible with our on_parallel_runner_start ? For example, let's say that they're not using our kedro.runner.parallel_runner.ParallelRunner, but something else that happens to work in a similar fashion.

@SajidAlamQB
Copy link
Contributor Author

  • Besides from my question above, I know this is a draft but how would users with custom runners make theirs compatible with our on_parallel_runner_start ? For example, let's say that they're not using our kedro.runner.parallel_runner.ParallelRunner, but something else that happens to work in a similar fashion.

The current design avoids concurrent use of a single PluginManager instance across processes. Instead, ParallelRunner equips subprocesses with new, PluginManager instances containing only picklable hook implementations. These specific hook implementations (like DatasetStatsHook) are then responsible for using process safe mechanisms for any state that needs to be coordinated. The core pluggy PluginManager in the main process does not have its hooks called concurrently by different processes.

@SajidAlamQB SajidAlamQB marked this pull request as ready for review May 29, 2025 10:16
@SajidAlamQB SajidAlamQB requested a review from merelcht as a code owner May 29, 2025 10:16
Signed-off-by: Sajid Alam <[email protected]>
Copy link
Contributor

@rashidakanchwala rashidakanchwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code looks great. this will hopefully solve the problem we have on 'Run Status with Parallel Runners' -- do we have any tests for this; it would be nice to see some intergration tests.

@astrojuanlu
Copy link
Member

@SajidAlamQB addressed my first question above, but not the second

Besides from my question above, I know this is a draft but how would users with custom runners make theirs compatible with our on_parallel_runner_start ? For example, let's say that they're not using our kedro.runner.parallel_runner.ParallelRunner, but something else that happens to work in a similar fashion.

@SajidAlamQB
Copy link
Contributor Author

Besides from my question above, I know this is a draft but how would users with custom runners make theirs compatible with our on_parallel_runner_start ? For example, let's say that they're not using our kedro.runner.parallel_runner.ParallelRunner, but something else that happens to work in a similar fashion.

These are good points. Firstly, plugins wishing to support parallel execution using shared state (like the updated DatasetStatsHook) will now rely on the runner triggering these new RunnerSpecs hooks (on_parallel_runner_start and get_picklable_hook_implementations_for_subprocess).

For custom parallel runners (not inheriting from kedro.runner.parallel_runner.ParallelRunner) to achieve full compatibility with these advanced plugins, they would need to adopt a similar pattern to our ParallelRunner:

  • Manage Shared State
  • Trigger on_parallel_runner_start: Call this hook on their main process's PluginManager.
  • Handle Picklable Hooks for Workers: Call get_picklable_hook_implementations_for_subprocess on the main PluginManager, and then equip each worker unit with its own PluginManager instance populated with these picklable hook implementations.

Essentially, the new RunnerSpecs and ParallelRunner's implementation give a opinionated interface for this kind of shared state hook integration. While custom runners aren't forced to implement this, plugins designed for this pattern won't have their full parallel functionality enabled with those runners if this interface isn't supported.

I think Documenting this pattern for users developing custom runners who want compatibility with pluings is definitely a good idea.

@DimedS
Copy link
Member

DimedS commented Jun 4, 2025

Thanks for the great proposal, @SajidAlamQB - this is definitely a complex and important problem.

In my opinion, it should be possible to reach a state where hooks "just work" across all runners, without requiring any special handling from plugin authors. If we agree that's achievable, it would be better to aim for that direction, even if it requires more significant restructuring.

Overall, I think this is a great topic for a deeper discussion in a dedicated Tech Design session.

@merelcht
Copy link
Member

merelcht commented Jun 5, 2025

This is definitely not a trivial change and the amount of code still needed on the Viz side (https://github.com/kedro-org/kedro-viz/pull/2336/files) to make this work for just one hook (DatasetStatsHook), makes me wonder if this is the way to go. I totally agree with what @DimedS is saying:

In my opinion, it should be possible to reach a state where hooks "just work" across all runners, without requiring any special handling from plugin authors.

If every hook someone implements needs to have all this code to handle the case for execution with the ParallelRunner that's not a good experience. And I'd guess also too complicated for most people.

Also, following on what @rashidakanchwala said, from what I understand this wouldn't "just" make the run status code work as is right? We'd also need to update those hooks to pass state?

Can you schedule a discussion for this asap? Then we can decide on whether we'll give it another try for 1.0.0 or leave it as is and accept that hooks don't work with the ParallelRunner.

@SajidAlamQB
Copy link
Contributor Author

Following our technical design session today, I'm closing this PR based on the team's decision to pursue a research-first approach.

The team consensus was:

For 1.0: Document current limitations clearly
Post-1.0: Research simpler solutions that don't require plugin developers to have deep multiprocessing experience.

Read TD outcomes here: kedro-org/kedro-viz#1801 (comment)

@SajidAlamQB SajidAlamQB deleted the dev/add-new-hook-spec-for-parallelunner-develop branch June 11, 2025 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants