Skip to content

archival: Unify segment upload and reupload code paths #25731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Apr 5, 2025

This PR changes the segment upload path.
Previously, we used segment_collector to reupload data (compacted reupload or adjacent segment merging). To upload new segments archival_policy was used. So the code is scattered across two different places. This PR adds new collection mode to the segment_collector. This mode allows segment collector to upload new segments. It works similarly to archival_policy, tries to upload whole segments if possible. Does the scanning if needed. The goal here is to replace implementation of the segment_collector with the one that uses log_reader interface.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@Lazin Lazin force-pushed the feature/disagregate-storage-and-cloud branch from 6b95872 to 4dd2795 Compare April 11, 2025 15:40
@Lazin Lazin requested review from a team, r-vasquez and kbatuigas as code owners April 11, 2025 15:40
@Lazin Lazin requested review from andrewhsu and removed request for a team April 11, 2025 15:40
Lazin added 5 commits April 11, 2025 11:48
from the 'segment_collector' intance

Signed-off-by: Evgeny Lazin <[email protected]>
...make_puload_candidate_stream method

Signed-off-by: Evgeny Lazin <[email protected]>
"make_upload_candidate_stream" can skip offset now. The behavior is
similar to how ntp_archiver works in this case.

Signed-off-by: Evgeny Lazin <[email protected]>
Adds new collector mode which can be used to upload new segments. This
makes 'segment_collector' suitable to replace some logic inside the
'archival_policy'. The long term goal here is to make 'segment_reupload'
the only code path which is used to find upload candidates. Then the
'ntp_archiver' should start using method of the segment collector which
returns a stream instead of the set of segments. When its done the
internals of the 'segment_collector' could be updated to use
'log_reader' instead of iterating through the segments.

Signed-off-by: Evgeny Lazin <[email protected]>
@Lazin Lazin force-pushed the feature/disagregate-storage-and-cloud branch from 4dd2795 to 1a722c1 Compare April 11, 2025 15:48
@Lazin Lazin requested review from oleiman and removed request for a team, r-vasquez, kbatuigas and andrewhsu April 11, 2025 15:50
@Lazin Lazin force-pushed the feature/disagregate-storage-and-cloud branch from 1a722c1 to 9b26977 Compare April 14, 2025 12:09
@Lazin Lazin requested a review from oleiman April 14, 2025 12:09
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 14, 2025

Retry command for Build#64471

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_fast1@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_prevent_recovery@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_time_based_retention@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_many_partitions@{"check_mode":"check_manifest_and_segment_metadata","cloud_storage_type":1}
tests/rptest/tests/read_replica_e2e_test.py::TestReadReplicaService.test_simple_end_to_end@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"partition_count":10}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_fast1@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_prevent_recovery@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_time_based_retention@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_many_partitions@{"check_mode":"no_check","cloud_storage_type":2}

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 14, 2025

CI test results

test results on build#64471
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/64471#01963435-2d52-4b38-abe3-33ab4c056344 FLAKY 1/2
rptest.tests.archival_test.ArchivalTest.test_isolate.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FLAKY 1/2
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_decommissioning_finishes_after_manual_cancellation.delete_topic=True ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FLAKY 1/2
rptest.tests.rack_aware_replica_placement_test.RackAwarePlacementTest.test_replica_placement.rack_layout_str=ooooFF.num_partitions=50.replication_factor=3.num_topics=2 ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FLAKY 1/2
rptest.tests.read_replica_e2e_test.TestReadReplicaService.test_simple_end_to_end.partition_count=10.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FAIL 0/20
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FLAKY 1/2
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FLAKY 1/17
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FLAKY 1/7
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_and_segment_metadata ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af63-4659-b5ed-e06cea40adac FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64471#01963477-af61-4db9-930a-f1c814a468a4 FAIL 0/20
test results on build#64541
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FLAKY 1/2
rptest.tests.read_replica_e2e_test.TestReadReplicaService.test_simple_end_to_end.partition_count=10.cloud_storage_type_and_url_style=.CloudStorageType.S3.1.virtual_host ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FLAKY 1/4
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FLAKY 1/4
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.ABS.check_mode=no_check ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_and_segment_metadata ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f260-471e-82cf-055c0dcf33a4 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64541#019638df-f25c-4399-b2b9-bb6031d46145 FAIL 0/20
test results on build#64561
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FLAKY 1/2
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_admin_api_recovery.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast1.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_fast3.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FLAKY 1/3
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_many_partitions.cloud_storage_type=CloudStorageType.S3.check_mode=check_manifest_and_segment_metadata ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_prevent_recovery.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FAIL 0/20
rptest.tests.topic_recovery_test.TopicRecoveryTest.test_time_based_retention.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64561#01963a3c-0508-4677-a01b-6cfa0e335820 FAIL 0/20
test results on build#64583
test_id test_kind job_url test_status passed
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963bf0-1b9a-4d2c-8756-5e67c9429950 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_conflicting_names ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-435f-96ca-468a8ba593c7 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_creating_and_listing_migrations ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_creating_and_listing_migrations_wo_license ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a42-4242-b1a7-361923114193 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_higher_level_migration_api ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_list_mountable_topics ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.None.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-435f-96ca-468a8ba593c7 FLAKY 1/14
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FLAKY 1/6
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executed.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a42-4242-b1a7-361923114193 FLAKY 1/5
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FLAKY 1/8
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.executing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-435f-96ca-468a8ba593c7 FLAKY 1/8
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.prepared.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FLAKY 1/3
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.prepared.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a42-4242-b1a7-361923114193 FLAKY 1/5
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FLAKY 1/14
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-435f-96ca-468a8ba593c7 FLAKY 1/12
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=False.params=.cancellation.dir.out.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FLAKY 1/8
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.None.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.None.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a42-4242-b1a7-361923114193 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FLAKY 1/3
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executed.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963bf0-1b9a-4d2c-8756-5e67c9429950 FLAKY 1/4
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executed.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-435f-96ca-468a8ba593c7 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.executing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a42-4242-b1a7-361923114193 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.prepared.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.prepared.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-435f-96ca-468a8ba593c7 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.in.stage.preparing.use_alias.True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a42-4242-b1a7-361923114193 FAIL 0/20
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.dir.out.stage.executed.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FAIL 0/20
rptest.tests.datalake.mount_unmount_test.MountUnmountIcebergTest.test_simple_unmount.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963bf0-1b9a-4d2c-8756-5e67c9429950 FLAKY 1/2
rptest.tests.datalake.rest_catalog_connection_test.RestCatalogConnectionTest.test_redpanda_connection_to_rest_catalog.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963bf0-1b9a-4d2c-8756-5e67c9429950 FLAKY 1/2
rptest.tests.nodes_decommissioning_test.NodesDecommissioningTest.test_recycle_all_nodes.auto_assign_node_id=False ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963bf0-1b9a-4d2c-8756-5e67c9429950 FAIL 0/1
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a44-4491-8b05-be958c404ed9 FLAKY 1/2
rptest.tests.shadow_indexing_compacted_topic_test.TSWithAlreadyCompactedTopic.test_initial_upload ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FAIL 0/20
rptest.tests.tiered_storage_pause_test.TestTieredStoragePause.test_safe_pause_resume.allow_gaps_topic_level=True.allow_gaps_cluster_level=True ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963bf0-1b9a-4d2c-8756-5e67c9429950 FAIL 0/20
rptest.tests.topic_delete_test.TopicDeleteCloudStorageTest.drop_lifecycle_marker_test.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/64583#01963c04-0a43-4cae-8966-c29c0a10281f FAIL 0/1

@oleiman
Copy link
Member

oleiman commented Apr 14, 2025

If I'm not mistaken, we also need to reproduce this logic somewhere:

if (!closed && !below_flush_offset) {
auto kafka_start_offset = log->from_log_offset(start_offset);
auto kafka_lso = log->from_log_offset(model::next_offset(adjusted_lso));
if (kafka_start_offset >= kafka_lso) {
// If timeboxed uploads are enabled and there is no producer
// activity, we can get into a nasty loop where we upload a segment,
// add an archival metadata batch, upload a segment containing that
// batch, add another archival metadata batch, etc. This leads to
// lots of small segments that don't contain data being uploaded. To
// avoid it, we check that kafka (translated) offset increases.
vlog(
archival_log.debug,
"Upload policy for {}: can't find candidate, only non-data "
"batches to upload (kafka start_offset: {}, kafka "
"last_stable_offset: {})",
_ntp,
kafka_start_offset,
kafka_lso);
return {};

BTW @Lazin, I'm hacking on this diff a bit, locally, so I'm happy to push it over the line if you're working on other stuff. Up to you.

@oleiman
Copy link
Member

oleiman commented Apr 15, 2025

something like this?

diff --git a/src/v/cluster/archival/archival_policy.cc b/src/v/cluster/archival/archival_policy.cc
index 8255690645..c368507b51 100644
--- a/src/v/cluster/archival/archival_policy.cc
+++ b/src/v/cluster/archival/archival_policy.cc
@@ -203,7 +203,8 @@ ss::future<candidate_creation_result> archival_policy::get_next_segment(

     segment_collector.collect_segments(
       segment_collector_mode::new_non_compacted);
-    if (!segment_collector.segment_ready_for_upload()) {
+    if (!segment_collector.segment_ready_for_upload(
+          begin_inclusive, end_exclusive, flush_offset)) {
         co_return candidate_creation_error::no_segments_collected;
     }

diff --git a/src/v/cluster/archival/segment_reupload.cc b/src/v/cluster/archival/segment_reupload.cc
index c6ad537c0b..de6dccb659 100644
--- a/src/v/cluster/archival/segment_reupload.cc
+++ b/src/v/cluster/archival/segment_reupload.cc
@@ -534,8 +534,47 @@ bool segment_collector::should_replace_manifest_segment() const {
     return _can_replace_manifest_segment && _begin_inclusive < _end_inclusive;
 }

-bool segment_collector::segment_ready_for_upload() const {
-    return _begin_inclusive <= _end_inclusive && !_segments.empty();
+bool segment_collector::segment_ready_for_upload(
+  model::offset start_offset,
+  model::offset end_exclusive,
+  std::optional<model::offset> flush_offset) const {
+    if (_begin_inclusive > _end_inclusive || _segments.empty()) {
+        return false;
+    }
+    const auto* segment = _segments.front().get();
+    if (segment == nullptr) {
+        return false;
+    }
+
+    auto below_flush_offset = flush_offset.has_value()
+                              && segment->offsets().get_base_offset()
+                                   <= flush_offset.value();
+    auto closed = !segment->has_appender();
+
+    if (!closed && !below_flush_offset) {
+        auto kafka_start_offset = _log.from_log_offset(start_offset);
+        auto kafka_lso = _log.from_log_offset(end_exclusive);
+
+        if (kafka_start_offset >= kafka_lso) {
+            // If timeboxed uploads are enabled and there is no producer
+            // activity, we can get into a nasty loop where we upload a segment,
+            // add an archival metadata batch, upload a segment containing that
+            // batch, add another archival metadata batch, etc. This leads to
+            // lots of small segments that don't contain data being uploaded. To
+            // avoid it, we check that kafka (translated) offset increases.
+            vlog(
+              archival_log.debug,
+              "Segment collector for {}: can't find candidate, only non-data "
+              "batches to upload (kafka start_offset: {}, kafka "
+              "last_stable_offset: {})",
+              _manifest.get_ntp(),
+              kafka_start_offset,
+              kafka_lso);
+            return false;
+        }
+    }
+
+    return true;
 }

 cloud_storage::segment_name segment_collector::adjust_segment_name() const {
@@ -633,10 +672,11 @@ ss::future<candidate_creation_result> segment_collector::make_upload_candidate(
             for (const auto& s : _segments) {
                 fmt::print(
                   seg,
-                  "{}-{}/{}; ",
+                  "{}-{}/{} [closed? {}]; ",
                   s->offsets().get_base_offset(),
                   s->offsets().get_committed_offset(),
-                  s->size_bytes());
+                  s->size_bytes(),
+                  !s->has_appender());
             }
             vlog(archival_log.debug, "Collected segments: {}", seg.str());
         }
@@ -793,6 +833,8 @@ ss::future<candidate_creation_result> segment_collector::make_upload_candidate(
         }
     }

+    std::cerr << "create upload candidate" << std::endl;
+
     co_return upload_candidate_with_locks{
       upload_candidate{
         .exposed_name = adjust_segment_name(),
diff --git a/src/v/cluster/archival/segment_reupload.h b/src/v/cluster/archival/segment_reupload.h
index fb1ff454d3..f7a7fd140c 100644
--- a/src/v/cluster/archival/segment_reupload.h
+++ b/src/v/cluster/archival/segment_reupload.h
@@ -91,7 +91,10 @@ public:
     bool should_replace_manifest_segment() const;

     /// Segments are found and can be uploaded to the cloud.
-    bool segment_ready_for_upload() const;
+    bool segment_ready_for_upload(
+      model::offset start_offset,
+      model::offset end_exclusive,
+      std::optional<model::offset> flush_offset) const;

     /// The starting point for the collection, this may not coincide with the
     /// start of the first collected segment. It should be aligned

@Lazin
Copy link
Contributor Author

Lazin commented Apr 15, 2025

@oleiman IIUC the proposed change the upload candidate lookup is happening every time even if the only batch in the log is a config batch. I think it could be done without scanning the segments. I'll try to fix this today. If the PR is unchanged within few hours from the moment I posted this message feel free to push into my branch.

@Lazin Lazin force-pushed the feature/disagregate-storage-and-cloud branch from 9b26977 to b9f8508 Compare April 15, 2025 08:44
@Lazin
Copy link
Contributor Author

Lazin commented Apr 15, 2025

I made the change archival_policy. So now it checks that the upload will move last uploaded kafka offset forward before looking at segments. Let's see how it will work.

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 15, 2025

Retry command for Build#64541

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_fast1@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_prevent_recovery@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_time_based_retention@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_many_partitions@{"check_mode":"check_manifest_and_segment_metadata","cloud_storage_type":1}
tests/rptest/tests/read_replica_e2e_test.py::TestReadReplicaService.test_simple_end_to_end@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"partition_count":10}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_fast1@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_prevent_recovery@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_time_based_retention@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_many_partitions@{"check_mode":"no_check","cloud_storage_type":2}

Lazin added 2 commits April 15, 2025 11:03
Previously, the search of segments that can be uploaded to the cloud
storage was implemented inside the archival_policy. For non-compacted
segments the archival_policy used its internal implementation but for
compacted reuploads it was using segment_collector.

This commit introduces new segment serach logic that uses
segment_collector so both compacted and non-compacted uploads are going
through the collector now.

Signed-off-by: Evgeny Lazin <[email protected]>
Signed-off-by: Evgeny Lazin <[email protected]>
@Lazin Lazin force-pushed the feature/disagregate-storage-and-cloud branch from b9f8508 to b670448 Compare April 15, 2025 15:03
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 15, 2025

Retry command for Build#64561

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_fast1@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_prevent_recovery@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_time_based_retention@{"cloud_storage_type":1}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_many_partitions@{"check_mode":"check_manifest_and_segment_metadata","cloud_storage_type":1}
tests/rptest/tests/read_replica_e2e_test.py::TestReadReplicaService.test_simple_end_to_end@{"cloud_storage_type_and_url_style":[1,"virtual_host"],"partition_count":10}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_admin_api_recovery@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_fast1@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_prevent_recovery@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_time_based_retention@{"cloud_storage_type":2}
tests/rptest/tests/topic_recovery_test.py::TopicRecoveryTest.test_many_partitions@{"check_mode":"no_check","cloud_storage_type":2}

auto kafka_begin_inclusive = log->from_log_offset(begin_inclusive);
auto kafka_end_exclusive = log->from_log_offset(
model::next_offset(end_inclusive.value()));
if (kafka_begin_inclusive > kafka_end_exclusive) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (kafka_begin_inclusive > kafka_end_exclusive) {
if (kafka_begin_inclusive >= kafka_end_exclusive) {

seems to do it 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite :\

@oleiman oleiman self-requested a review April 16, 2025 00:41
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 16, 2025

Retry command for Build#64583

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"executed"},"use_alias":true},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"prepared"},"use_alias":true},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_conflicting_names
tests/rptest/tests/nodes_decommissioning_test.py::NodesDecommissioningTest.test_recycle_all_nodes@{"auto_assign_node_id":false}
tests/rptest/tests/tiered_storage_pause_test.py::TestTieredStoragePause.test_safe_pause_resume@{"allow_gaps_cluster_level":true,"allow_gaps_topic_level":true}
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest.drop_lifecycle_marker_test@{"cloud_storage_type":1}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"prepared"},"use_alias":false},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"out","stage":"executed"},"use_alias":false},"transfer_leadership":true}
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::TSWithAlreadyCompactedTopic.test_initial_upload
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_higher_level_migration_api
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"executing"},"use_alias":false},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":null,"use_alias":false},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"preparing"},"use_alias":false},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_list_mountable_topics
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_creating_and_listing_migrations
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"executing"},"use_alias":true},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":null,"use_alias":true},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_migrated_topic_data_integrity@{"params":{"cancellation":{"dir":"in","stage":"preparing"},"use_alias":true},"transfer_leadership":true}
tests/rptest/tests/data_migrations_api_test.py::DataMigrationsApiTest.test_creating_and_listing_migrations_wo_license

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants