feat(pyspark): support partitioning in PySpark backend file writes #10850

jakepenzak · 2025-02-15T05:01:55Z

Description of changes

Enabled partitioning in create_table and to_parquet methods for pyspark backend (already existed for to_delta)
- Added partition_by argument to create_table method for PySpark backend
- Overrode to_parquet method for PySpark backend to leverage pyspark.sql.DataFrameWriter using a similar pattern as to_delta override, enabling corresponding kwargs for partitioning
- Added corresponding tests to ensure partitioning behaves as expected

Issues closed

Resolves feat(pyspark): support partition_by key for PySpark file writes #8900

cpcloud

LGTM, thanks!

🚢 it!

cpcloud · 2025-02-19T11:11:36Z

I'll fix up the remote test failures, which should probably be skipped given that the writing location for a remote spark instance isn't well-defined (unless it's a bucket, but the tests deal only in local file paths).

Adds the partitionBy argument to create_table method in pyspark backend to enable partitioned table creation fixes ibis-project#8900

github-actions bot added tests Issues or PRs related to tests pyspark The Apache PySpark backend labels Feb 15, 2025

jakepenzak mentioned this pull request Feb 15, 2025

feat(pyspark): support partition_by key for PySpark file writes #8900

Closed

1 task

jakepenzak changed the title ~~feat(pyspark): support partition_by in create_table method~~ feat(pyspark): support partitioning in PySpark backend file writes Feb 15, 2025

cpcloud approved these changes Feb 19, 2025

View reviewed changes

cpcloud force-pushed the main branch from 0aba430 to b124e53 Compare February 19, 2025 11:19

feat(pyspark): add partitionBy argument to create_table

833d895

Adds the partitionBy argument to create_table method in pyspark backend to enable partitioned table creation fixes ibis-project#8900

cpcloud force-pushed the main branch from b124e53 to 833d895 Compare February 19, 2025 11:19

cpcloud enabled auto-merge (rebase) February 19, 2025 11:20

cpcloud merged commit c99cc23 into ibis-project:main Feb 19, 2025
88 of 89 checks passed

jakepenzak mentioned this pull request Apr 9, 2025

feat(pyspark): expose merge_schema option in create_table #11071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pyspark): support partitioning in PySpark backend file writes #10850

feat(pyspark): support partitioning in PySpark backend file writes #10850

jakepenzak commented Feb 15, 2025 •

edited

Loading

cpcloud left a comment

cpcloud commented Feb 19, 2025

feat(pyspark): support partitioning in PySpark backend file writes #10850

feat(pyspark): support partitioning in PySpark backend file writes #10850

Conversation

jakepenzak commented Feb 15, 2025 • edited Loading

Description of changes

Issues closed

cpcloud left a comment

Choose a reason for hiding this comment

cpcloud commented Feb 19, 2025

jakepenzak commented Feb 15, 2025 •

edited

Loading