Skip to content

Add --substrait-round-trip option in sqllogictests #16183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gabotechs
Copy link
Contributor

Which issue does this PR close?

It probably does not fully closes it but it partially addresses:

Rationale for this change

Being able to execute the full sqllogictest suite adding a Substrait roundtrip. This means:

  • Parsing the test statement
  • Building a logical plan out of it
  • Converting the logical plan to Substrait
  • Converting the Substrait plan back to logical
  • Execute it

What changes are included in this PR?

Adds a new DataFusionSubstraitRoundTrip sqllogictest runner implementation that will do a Substrait roundtrip where it makes sense, and a new --substrait-round-trip option to the sqllogictest CLI.

Example:

Runs the full aggregate.slt file doing a Substrait roundtrip on each individual statement

cargo test --test sqllogictests -- --substrait-round-trip aggregate 

Runs an individual failing test reproducing one specific issue with the current Substrait conversions

cargo test --test sqllogictests -- --substrait-round-trip aggregate:1522

This is still not enforced in the CI, and I would say the expectations are that it's just another tool to catch existing bugs in the current Substrait conversion code and better communicate them so that contributors have a clear path forward for contributing fixes.

Are these changes tested?

N/A

Are there any user-facing changes?

No

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 24, 2025
@gabotechs gabotechs force-pushed the add-substrait-rountrip-option-in-sqllogictests branch 4 times, most recently from dd41d5b to 2cc9244 Compare May 30, 2025 08:56
@gabotechs gabotechs force-pushed the add-substrait-rountrip-option-in-sqllogictests branch from 2cc9244 to e481a15 Compare May 30, 2025 09:16
@gabotechs
Copy link
Contributor Author

This one is good for a review! cc @alamb @Blizzara

@Blizzara
Copy link
Contributor

Blizzara commented Jun 2, 2025

Code looks fine to me but I'm not familiar with the sqllogictests code so will leave it to Andrew to review properly.

I really like this additional testing for Substrait, thanks!

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I suggest adding a CI test that exercises this option with a passing .slt file (or create a minimal one). Otherwise this LGTM.

@github-actions github-actions bot added the development-process Related to development process of DataFusion label Jun 4, 2025
@gabotechs
Copy link
Contributor Author

I suggest adding a CI test that exercises this option with a passing .slt file

Something like this 1a05ba2?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @gabotechs -- thank you

I also tried it locally with tests I know fail and got an error as expected:

cargo test --test sqllogictests -- --substrait-round-trip agg

...

10. query failed: DataFusion error: This feature is not implemented: Producing a row from empty relation is unsupported
[SQL] with data as (
  select 1 as f,       4 as b
  union all
  select null as f,   99 as b
  union all
  select 2 as f,       5 as b
  union all
  select 98 as f,   null as b
  union all
  select 3 as f,       6 as b
  union all
  select null as f, null as b
)
select corr(f, b)
from data
at /Users/andrewlamb/Software/datafusion2/datafusion/sqllogictest/test_files/aggregate.slt:539

@@ -476,6 +476,28 @@ jobs:
POSTGRES_HOST: postgres
POSTGRES_PORT: ${{ job.services.postgres.ports[5432] }}

sqllogictest-substrait:
name: "Run sqllogictest in Substrait round-trip mode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb
Copy link
Contributor

alamb commented Jun 5, 2025

Thank you for the review @2010YOUY01

@alamb alamb merged commit eeee6b0 into apache:main Jun 5, 2025
30 checks passed
kosiew pushed a commit to kosiew/datafusion that referenced this pull request Jun 9, 2025
* Add substrait roundtrip option in sqllogictests

* Fix doc link and missing license header

* Add README.md entry for the Substrait round-trip mode

* Link tracking issue in README.md

* Use clap's `conflicts_with` instead of manually checking flag compatibility

* Add sqllogictest-substrait job to the CI

* Revert committed formatting changes to README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of DataFusion sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants