Skip to content

perf(duckdb): optimize generated SQL for ArrayConcat #10999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2025

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented Mar 13, 2025

Taking this ibis code:

import ibis

e = ibis.array([1]).concat([None], [2, 3], [4, 5, 6])
ibis.to_sql(e)

before:

SELECT
  CASE
    WHEN CASE
      WHEN CASE
        WHEN CAST([1] AS TINYINT[]) IS NULL OR [NULL] IS NULL
        THEN NULL
        ELSE LIST_CONCAT(CAST([1] AS TINYINT[]), [NULL])
      END IS NULL
      OR [2, 3] IS NULL
      THEN NULL
      ELSE LIST_CONCAT(
        CASE
          WHEN CAST([1] AS TINYINT[]) IS NULL OR [NULL] IS NULL
          THEN NULL
          ELSE LIST_CONCAT(CAST([1] AS TINYINT[]), [NULL])
        END,
        [2, 3]
      )
    END IS NULL
    OR [4, 5, 6] IS NULL
    THEN NULL
    ELSE LIST_CONCAT(
      CASE
        WHEN CASE
          WHEN CAST([1] AS TINYINT[]) IS NULL OR [NULL] IS NULL
          THEN NULL
          ELSE LIST_CONCAT(CAST([1] AS TINYINT[]), [NULL])
        END IS NULL
        OR [2, 3] IS NULL
        THEN NULL
        ELSE LIST_CONCAT(
          CASE
            WHEN CAST([1] AS TINYINT[]) IS NULL OR [NULL] IS NULL
            THEN NULL
            ELSE LIST_CONCAT(CAST([1] AS TINYINT[]), [NULL])
          END,
          [2, 3]
        )
      END,
      [4, 5, 6]
    )
  END AS "ArrayConcat((Array((1,)), (None,), (2, 3), (4, 5, 6)))"

after:

SELECT
  CASE
    WHEN CAST([1] AS TINYINT[]) IS NULL
    OR [NULL] IS NULL
    OR [2, 3] IS NULL
    OR [4, 5, 6] IS NULL
    THEN NULL
    ELSE LIST_CONCAT(LIST_CONCAT(LIST_CONCAT(CAST([1] AS TINYINT[]), [NULL]), [2, 3]), [4, 5, 6])
  END AS "ArrayConcat((Array((1,)), (None,), (2, 3), (4, 5, 6)))"

For a more real-world example, on the example function I give in #10996, before this change I get 8400 lines of SQL, afterwards it is 3400!

@github-actions github-actions bot added the sql Backends that generate SQL label Mar 13, 2025
@NickCrews NickCrews changed the title perf: optimize generated SQL for ArrayConcat perf(duckdb): optimize generated SQL for ArrayConcat Mar 13, 2025
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find, thanks!

@cpcloud cpcloud added performance Issues related to ibis's performance duckdb The DuckDB backend labels Mar 24, 2025
@cpcloud cpcloud merged commit d38db7e into ibis-project:main Mar 24, 2025
115 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duckdb The DuckDB backend performance Issues related to ibis's performance sql Backends that generate SQL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants