Skip to content

fix(join): error in more places on colliding column names #10778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Feb 3, 2025

I'm adding this PR mostly as a demo of what I think might be undesirable behavior. Should this instead error, because the two input columns are getting mapped to the same output column?

@cpcloud curious what you think here

@github-actions github-actions bot added the tests Issues or PRs related to tests label Feb 3, 2025
@NickCrews NickCrews marked this pull request as ready for review February 3, 2025 05:06
@cpcloud
Copy link
Member

cpcloud commented Feb 4, 2025

Yep, this isn't entirely correct.

The name is only allowed to be merged into a single output column in the case of inner joins (with equality predicates only, which is implicit in this test).

…detection

Before, .count and .schema() would not trigger the ._finish()
method, so we wouldn't detect that there were duplicate
fields in the output, which meant that these methods
returned the incorrect result.
@NickCrews NickCrews force-pushed the join-rename-collisions branch from 42c6645 to 30e1b6b Compare February 20, 2025 01:50
@NickCrews NickCrews changed the title test(join): add test that colliding column names will get merged into one fix(join): error in more places on colliding column names Feb 20, 2025
@NickCrews
Copy link
Contributor Author

NickCrews commented Feb 20, 2025

OK, I found the fix.

In general, I think it is very fragile and bad to need to explicitly wrap every single public method of Table with the finished() decorator in the Join class. It also is some spooky-action-at-a-distance behavior here, where things will only blow up at some later function call, not close to where the source of the problem is. This is different from the rest of ibis, which usually tries to error as early/close to the original cause/function call as possible. I would much prefer it if we forced the user, at the time they call .join() to tell us how to resolve these collisions. But this is almost definitely a breaking API change, and not in scope for this PR. But just writing it down somewhere as I'm thinking of it.

@cpcloud
Copy link
Member

cpcloud commented Feb 20, 2025

In general, I think it is very fragile and bad to need to explicitly wrap every single public method of Table with the finished() decorator in the Join class. It also is some spooky-action-at-a-distance behavior here, where things will only blow up at some later function call, not close to where the source of the problem is

This is the trade-off with the redesign of the join implementation. There was no obvious alternative that ALSO allowed users to refer to more than just the immediate child table.

This convenience was something that was requested over and over again by a variety of users, and @kszucs came up with a design to support it.

This is different from the rest of ibis, which usually tries to error as early/close to the original cause/function call as possible.

Yes, again, by design.

I would much prefer it if we forced the user, at the time they call .join() to tell us how to resolve these collisions.

We used to have this behavior, and people really disliked it. It doesn't really make sense to relitigate this.

@cpcloud cpcloud merged commit ec06e1e into ibis-project:main Feb 20, 2025
90 checks passed
@kszucs
Copy link
Member

kszucs commented Feb 21, 2025

We could automatically add the methods/properties after the class definition to make it more future-proof to the ir.Table changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Issues or PRs related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants