-
Notifications
You must be signed in to change notification settings - Fork 954
Cache dictionary inputs processing in Node class #4688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache dictionary inputs processing in Node class #4688
Conversation
Signed-off-by: Arnout Verboven <[email protected]>
First of all: thanks a lot for this PR @ArnoutVerboven ! 🙌🏼 Comment from the peanut gallery: could we just use |
kedro/pipeline/node.py
Outdated
self._inputs = inputs | ||
# Cache the list version of inputs dictionary to avoid recomputing in the inputs property | ||
self._dict_inputs_as_list = None | ||
if isinstance(self._inputs, dict): | ||
self._dict_inputs_as_list = _dict_inputs_to_list(self._func, self._inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._inputs = inputs | |
# Cache the list version of inputs dictionary to avoid recomputing in the inputs property | |
self._dict_inputs_as_list = None | |
if isinstance(self._inputs, dict): | |
self._dict_inputs_as_list = _dict_inputs_to_list(self._func, self._inputs) | |
self._inputs = _dict_inputs_to_list(self._func, self._inputs) if isinstance(inputs, dict) else inputs |
Any reason this wouldn't just work?
@astrojuanlu This seems fine, but I wonder if there's any benefit of doing this over just doing the conversion when setting |
Signed-off-by: Arnout Verboven <[email protected]>
I adjusted the code according to @astrojuanlu 's suggestion of using cached_property.
I see there is a setter for @deepyaman I went with this approach as it still achieves the desired performance improvement with minimal changes to the other logic, but if you prefer updating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there is a setter for
func
so I'm clearing theinputs
cache in the setter.@deepyaman I went with this approach as it still achieves the desired performance improvement with minimal changes to the other logic, but if you prefer updating
_inputs
inNode.__init__()
directly, feel free to open it as a separate PR and close this one!
No super-strong preference. :) Congrats on your first contribution!
CI failure is unrelated, addressing in #4695 |
Thanks @ArnoutVerboven ! Did you test this code with your use case to double check that the performance is still good? |
Yes @astrojuanlu ! Still achieving same speed-up using cached_property (130s -> 3s spent on 'inputs' in my pipeline) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArnoutVerboven this looks good to me! Loved reading your blog post too 😄
Can you add a short sentence of this change to the release notes and add yourself as contributor?
Signed-off-by: Arnout Verboven <[email protected]>
Done :) |
Description
This PR addresses a performance issue in the Kedro framework related to the Node class. When nodes are provided with dictionary inputs, the inputs property recalculates _dict_inputs_to_list on every access, which becomes inefficient when running pipelines with many nodes. This optimization caches the conversion result to avoid redundant computation.
Development notes
I've made the following changes:
1. Added a _dict_inputs_as_list instance variable in Node.init to cache the result of _dict_inputs_to_list for dictionary inputs2. Modified the inputs property to use this cached value instead of recomputing it on every access
3. Updated the func setter to recalculate the cached value when the function is changed
EDIT:
@property
to@cached_property
for theinputs
method to cache the resultfunc
setter to clear the cached property with a simpledel self.inputs
when the function changesThese changes are minimal and tested giving a significant performance improvement on a case with ~1k nodes.
Checklist
RELEASE.md
file