Let users disable datasets renaming for namespaced pipelines #4700

ankatiyar · 2025-05-06T10:59:53Z

Description

Fix #4566 This PR is a prototype to demo the solution

Development notes

Add a prefix_namespace argument to pipeline() which defaults to True but can be set to False instead of automatic renaming of the datasets

What it would look like for users:

from kedro.pipeline import Pipeline, node, pipeline
from kedro_example.pipelines.data_science.nodes import train_model
from .nodes import create_model_input_table, preprocess_companies, preprocess_shuttles


def create_pipeline(**kwargs) -> Pipeline:
    return pipeline(
        [
            node(
                func=preprocess_companies,
                inputs="companies",
                outputs="preprocessed_companies",
                name="preprocess_companies_node",
            ),
            node(
                func=preprocess_shuttles,
                inputs="shuttles",
                outputs="preprocessed_shuttles",
                name="preprocess_shuttles_node",
            ),
            node(
                func=create_model_input_table,
                inputs=["preprocessed_shuttles", "preprocessed_companies", "reviews"],
                outputs="model_input_table",
                name="create_model_input_table_node",
            ),            
        ],
        namespace="data_processing",
        prefix_namespace=False,
    )

TODOs if we decide to go for this solution:

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

Read the contributing guidelines
Signed off each commit with a Developer Certificate of Origin (DCO)
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change in the RELEASE.md file
Added tests to cover my changes
Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ankita Katiyar <[email protected]>

ElenaKhaustova

The suggested solution looks good to me!

I think one important thing not mentioned here is what will be the API for users to enable/disable this behaviour.

kedro/pipeline/pipeline.py

ElenaKhaustova

Thank you for adding extra explanation!

For me rename flag seems a big confusing. Maybe we can consider some other names like namespace_entries/ namespace_datasets. Not a blocker as solution itself looks good.

DimedS

Thanks for the PR, @ankatiyar!

I fully support the solution - adding an option to avoid prefixing of inputs, outputs, parameters and confirms when creating a new pipeline with a namespace makes perfect sense.

A few concerns to consider in the current PR:

Naming of the option ("rename")
The current name does not seem clear enough. Imagine seeing Pipeline(..., rename=False) — it is not obvious what is not being renamed. Is it the pipeline itself, the nodes, the datasets?
Given that this is a niche option, the name should be more descriptive and self-explanatory.
Placement of the logic in _rename()
Currently, the logic to conditionally skip renaming is embedded in the _rename() function. However, based on the name, this function should only be responsible for renaming. It is confusing to pass an option to a renaming function that effectively tells it not to rename.
A cleaner approach would be to make this decision at a higher level, such as in _process_dataset_names(). That function can decide:
- If renaming is needed → call _rename()
- If renaming is not needed → skip _rename() entirely.

merelcht · 2025-05-07T12:56:58Z

I like the idea of adding this argument, but similar to what @ElenaKhaustova and @DimedS have mentioned, it's not clear to me what "rename" actually means. If rename is False, I would assume no renaming is needed, but it's the other way around 😅 Namespaces is already a difficult feature to understand, so I think we need to be more verbose with the name for this argument to make it clear what the purpose is.

Does this go in main or develop?

While the way this is implemented is a non-breaking change, I'd suggest adding it to 1.0.0 to beef up the new feature set 🙂

Signed-off-by: Ankita Katiyar <[email protected]>

astrojuanlu · 2025-05-13T14:45:14Z

Instead of rename, what about the ability of setting a custom prefix? And then users could do override_prefix="" or similar ❓

Signed-off-by: Ankita Katiyar <[email protected]>

ankatiyar · 2025-05-19T10:26:36Z

I changed the name to prefix_namespace but I'm also not sure if it is unambiguous enough. Can I get some votes/ideas?
Some ideas/options:

rename
rename_datasets
prefix_namespace
auto_prefix_namespace
auto_prefix
override_prefix

@merelcht @astrojuanlu @ElenaKhaustova @lrcouto @DimedS

DimedS · 2025-05-19T12:59:12Z

I changed the name to prefix_namespace but I'm also not sure if it is unambiguous enough. Can I get some votes/ideas? Some ideas/options:

rename

rename_datasets

prefix_namespace

auto_prefix_namespace

auto_prefix

override_prefix

@merelcht @astrojuanlu @ElenaKhaustova @lrcouto @DimedS

thanks @ankatiyar , I like prefix_datasets_with_namespace, for me prefix_namespace sounds like we plan to prefix namespace itself

merelcht · 2025-05-20T14:32:40Z

Initially I thought prefix_namespace is a logical name, but exploring this a bit more it seems like Dmitry's suggestion prefix_datasets_with_namespace is more accurate.

After testing, the namespace is still applied to the node name itself, so when running viz the suggestions to run a node show something like kedro run --to-nodes='data_science.evaluate_model([regressor;X_test;y_test]) -> None'

And running kedro registry describe will show something like:

Nodes:
- data_science.preprocess_data (preprocess_data)
- preprocess_data (preprocess_data)
- preprocess_shuttles (preprocess_shuttles)
- create_model_input_table (create_model_input_table)
- data_science.split_data (split_data)
- data_science.train_model (train_model)
- data_science.evaluate_model (evaluate_model)

ankatiyar · 2025-05-20T14:59:22Z

@merelcht The idea for this feature is to not have the users rename entries in the catalog with prefixes or have to define a mapping dictionary for the inputs/outputs. I think prefixing the name of the node is still valuable because we're still namespacing the pipeline and it should show on viz that way

merelcht · 2025-05-20T15:30:20Z

@merelcht The idea for this feature is to not have the users rename entries in the catalog with prefixes or have to define a mapping dictionary for the inputs/outputs. I think prefixing the name of the node is still valuable because we're still namespacing the pipeline and it should show on viz that way

I agree! But that's why I think Dmitry's suggestion for the argument name is the better option.

ankatiyar · 2025-05-20T15:45:35Z

@merelcht oh yeah, I agree! Will update the PR with the suggestion!

Signed-off-by: Ankita Katiyar <[email protected]>

DimedS

great work, @ankatiyar !

ankatiyar · 2025-05-21T14:28:04Z

Placement of the logic in _rename() Currently, the logic to conditionally skip renaming is embedded in the _rename() function. However, based on the name, this function should only be responsible for renaming. It is confusing to pass an option to a renaming function that effectively tells it not to rename. A cleaner approach would be to make this decision at a higher level, such as in _process_dataset_names(). That function can decide:

If renaming is needed → call _rename()

If renaming is not needed → skip _rename() entirely.

@DimedS I think the placement is not ideal but the argument goes in too deep and I think _rename is still the right place for the logic because there's other rules also that apply to the dataset names
__init__->_map_nodes->_copy_node->_process_dataset_names->_rename

ankatiyar · 2025-05-21T14:29:08Z

~~PS will do docs in a separate PR it's too spread across~~
Made some docs updates in this PR itself

Signed-off-by: Ankita Katiyar <[email protected]>

merelcht

Thanks @ankatiyar this looks great 👍

Makes sense to do the docs separately. That way you can also make sure they are done with the new formatting etc. (cc: @Huongg )

Update pipeline with rename option

79ad024

Signed-off-by: Ankita Katiyar <[email protected]>

ankatiyar mentioned this pull request May 6, 2025

Can we add fallback behaviour for automatically pre-fixed namespace datasets? #4566

Closed

ankatiyar requested review from DimedS, astrojuanlu and ElenaKhaustova May 6, 2025 11:28

ElenaKhaustova reviewed May 6, 2025

View reviewed changes

kedro/pipeline/pipeline.py Outdated Show resolved Hide resolved

ElenaKhaustova approved these changes May 7, 2025

View reviewed changes

DimedS reviewed May 7, 2025

View reviewed changes

Change rename to prefix_namesapce

609b667

Signed-off-by: Ankita Katiyar <[email protected]>

ankatiyar changed the base branch from main to develop May 19, 2025 09:24

ankatiyar and others added 2 commits May 19, 2025 11:24

Merge branch 'develop' into prototype-namespace-renaming

8168b31

Add test, release notes

c01028b

Signed-off-by: Ankita Katiyar <[email protected]>

ankatiyar changed the title ~~[prototype] Let users disable datasets renaming for namespaced pipelines~~ Let users disable datasets renaming for namespaced pipelines May 19, 2025

ankatiyar marked this pull request as ready for review May 19, 2025 10:24

ankatiyar requested a review from merelcht as a code owner May 19, 2025 10:24

ankatiyar requested a review from DimedS May 19, 2025 10:26

ankatiyar and others added 2 commits May 21, 2025 11:54

rename argument

104bb72

Signed-off-by: Ankita Katiyar <[email protected]>

Merge branch 'develop' into prototype-namespace-renaming

cff71f6

Signed-off-by: Ankita Katiyar <[email protected]>

DimedS approved these changes May 21, 2025

View reviewed changes

Add docs

17620a8

Signed-off-by: Ankita Katiyar <[email protected]>

ankatiyar requested a review from yetudada as a code owner May 21, 2025 14:55

merelcht approved these changes May 21, 2025

View reviewed changes

ankatiyar merged commit 4570923 into develop May 21, 2025
40 of 41 checks passed

ankatiyar deleted the prototype-namespace-renaming branch May 21, 2025 15:12

Let users disable datasets renaming for namespaced pipelines #4700

Let users disable datasets renaming for namespaced pipelines #4700

Uh oh!

Conversation

ankatiyar commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Development notes

Developer Certificate of Origin

Checklist

Uh oh!

ElenaKhaustova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ElenaKhaustova left a comment

Choose a reason for hiding this comment

Uh oh!

DimedS left a comment

Choose a reason for hiding this comment

Uh oh!

merelcht commented May 7, 2025

Uh oh!

astrojuanlu commented May 13, 2025

Uh oh!

ankatiyar commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DimedS commented May 19, 2025

Uh oh!

merelcht commented May 20, 2025

Uh oh!

ankatiyar commented May 20, 2025

Uh oh!

merelcht commented May 20, 2025

Uh oh!

ankatiyar commented May 20, 2025

Uh oh!

DimedS left a comment

Choose a reason for hiding this comment

Uh oh!

ankatiyar commented May 21, 2025

Uh oh!

ankatiyar commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

merelcht left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ankatiyar commented May 6, 2025 •

edited

Loading

ankatiyar commented May 19, 2025 •

edited

Loading

ankatiyar commented May 21, 2025 •

edited

Loading