Skip to content

feat(iceberg): introduce small file compaction #22527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Jul 7, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR introduces Small Data File Compaction and modifies the default behavior of the append-only Iceberg sink. It switches the default from Full Compaction to Small Data File Compaction to reduce write amplification and space amplification.

Checklist

  • I have written necessary rustdoc comments.
  • I have added necessary unit tests and integration tests.
  • I have added test labels as necessary.
  • I have added fuzzing tests or opened an issue to track them.
  • My PR contains breaking changes.
  • My PR changes performance-critical code, so I will run (micro) benchmarks and present the results.
  • I have checked the Release Timeline and Currently Supported Versions to determine which release branches I need to cherry-pick this PR into.

Documentation

  • My PR needs documentation updates.
Release note

@hzxa21 hzxa21 changed the title feat: introduce small file compaction feat(iceberg): introduce small file compaction Jul 8, 2025
@Li0k Li0k requested review from chenzl25 and xxhZs July 10, 2025 11:09
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's merge this PR quickly, because missing register_jvm_builder would cause iceberg compactor failure.

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

Comment on lines +1696 to +1701

if table.append_only {
sink_with.insert("type".to_owned(), "append-only".to_owned());
} else {
sink_with.insert("type".to_owned(), "upsert".to_owned());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For append_only iceberg table, we should remove the sink_with.insert("primary_key".to_owned(), pks.join(","));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants