implement gc data copy #303

JacksonYao287 · 2025-05-27T04:23:51Z

crash recovery is also in this PR

codecov-commenter · 2025-05-27T04:48:07Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 66.00567% with 120 lines in your changes missing coverage. Please review.

Project coverage is 62.96%. Comparing base (1746bcc) to head (3bed784).
Report is 84 commits behind head on main.

Files with missing lines	Patch %	Lines
src/lib/homestore_backend/gc_manager.cpp	63.31%	64 Missing and 20 partials ⚠️
src/lib/homestore_backend/hs_pg_manager.cpp	72.91%	10 Missing and 3 partials ⚠️
src/lib/homestore_backend/hs_shard_manager.cpp	72.41%	6 Missing and 2 partials ⚠️
src/lib/homestore_backend/heap_chunk_selector.cpp	65.00%	7 Missing ⚠️
src/lib/homestore_backend/hs_homeobject.cpp	46.15%	6 Missing and 1 partial ⚠️
src/lib/homestore_backend/hs_blob_manager.cpp	0.00%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #303      +/-   ##
==========================================
- Coverage   63.15%   62.96%   -0.19%     
==========================================
  Files          32       35       +3     
  Lines        1900     3786    +1886     
  Branches      204      453     +249     
==========================================
+ Hits         1200     2384    +1184     
- Misses        600     1150     +550     
- Partials      100      252     +152

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

shosseinimotlagh · 2025-05-29T04:55:15Z

src/lib/homestore_backend/gc_manager.cpp

@@ -212,7 +222,7 @@ GCManager::pdev_gc_actor::pdev_gc_actor(uint32_t pdev_id, std::shared_ptr< HeapC
                                        std::shared_ptr< GCBlobIndexTable > index_table, HSHomeObject* homeobject) :
        m_pdev_id{pdev_id},
        m_chunk_selector{chunk_selector},
-        m_reserved_chunk_queue{RESERVED_CHUNK_NUM_PER_PDEV},
+        m_reserved_chunk_queue{HS_BACKEND_DYNAMIC_CONFIG(reserved_chunk_num_per_pdev)},


is it hot swappable?

no, it is not support yet, it is finalized when the first time bootstrape.

then maybe lets put it into metablks？ What would happen if this value changed after bootstrap?

nothing will happen. ATM, all these parameters will only take effect at the initial step and never change again after that. we can make it hot swappable when we really need it

xiaoxichen · 2025-05-28T15:35:11Z

src/lib/homestore_backend/gc_manager.cpp

+                        data_sgs.iovs.emplace_back(
+                            iovec{.iov_base = iomanager.iobuf_alloc(blk_size, total_size), .iov_len = total_size});
+
+                        futs.emplace_back(std::move(


I really not sure the disk behavior of read a blob then write that blob to elsewhere... will that generate lots of random IO? Hope we can verify the io behavior and performance in further testing.

yes, this will definitely generate some random io since the are probabaly in different HDD sector. we can limit the impact by rate limiter which will constrain the traffic from gc.

If it is the case, it seems make more sense to fetch a batch of blobs then write a batch of blobs. Rate-limiting doesnt help an in-efficient IO become efficient...

my point here does not want an in-efficient IO to become efficient

I mean, through the ratelimiter, we can constrain the amount of in-efficent io, so that inefficient io will only take a small part of total io. let me think it a bit more, we can cycle back on this later

xiaoxichen · 2025-06-02T09:13:42Z

src/lib/homestore_backend/gc_manager.cpp

@@ -212,7 +222,7 @@ GCManager::pdev_gc_actor::pdev_gc_actor(uint32_t pdev_id, std::shared_ptr< HeapC
                                        std::shared_ptr< GCBlobIndexTable > index_table, HSHomeObject* homeobject) :
        m_pdev_id{pdev_id},
        m_chunk_selector{chunk_selector},
-        m_reserved_chunk_queue{RESERVED_CHUNK_NUM_PER_PDEV},
+        m_reserved_chunk_queue{HS_BACKEND_DYNAMIC_CONFIG(reserved_chunk_num_per_pdev)},


then maybe lets put it into metablks？ What would happen if this value changed after bootstrap?

xiaoxichen · 2025-06-03T02:37:44Z

src/lib/homestore_backend/gc_manager.cpp

+        // time.
+        // TODO:: handle the error case here.
+        LOGERROR("Failed to query blobs in gc index table for ret={} move_to_chunk={}", ret, move_to_chunk);
+        m_reserved_chunk_queue.blockingWrite(move_to_chunk);


it is a bit wired to add chunk back to queue here, as the chunk was dequed elsewhere not in the function.

the concern here is why does replacing index fails.

1 if it fails before changing pg index table, we can return a false, and put the move_to_chunk back to the queue safely.

2 if it fails (or crash) when changing pg index table, partial update probably happens. in this case, we can not put it to the queue( since some of the pg blob index has been updated and some pba points to the move_to_chunk now) and let it used by other gc tasks.

so I keep it like this to differentiate the safe case and unsafe case. I can refine this by return a status code or separated a new function. but let`s cycle back on this later after all the functionality works well

src/lib/homestore_backend/gc_manager.cpp

xiaoxichen · 2025-06-03T03:06:16Z

src/lib/homestore_backend/hs_pg_manager.cpp

+        // considering the complexity of gc crash recovery for tombstone_blob_count, we get it directly from index table
+        // , which is the most accurate.
+
+        // TODO::do we need this as durable entity? remove it and got all the from pg index in real time.


I think those metrics is not for 100% accurate but more for metrics. agree the point of maintaining these metrics durable introduce costs, but also want to comfirm how long it will take if we go through index during boot up.

I memory serves , I had a benchmark for rangequery and it is fast. I will double confirm it

The test should be if a SM filled up with 128K objects, how long it will take to go through the index during startup.

xiaoxichen · 2025-06-03T04:03:33Z

src/lib/homestore_backend/gc_manager.cpp

+            if (!is_emergent || !is_last_shard) continue;
+        }
+
+        // prepare a shard header for this shard in move_to_chunk


sorry I failed to follow this part.

1 if the shard header part.

for a sealed shard, the state in its shard header blk is open, and the state in its shard footer blk is sealed.. now we always generate the shard metablk by shard in-memory metadata , not reading it from disk since we do not maintain the pba of shard header and footer in indexsvc. as a result , for a sealed shard, after we build the shard metablk in memory(the in-memory state is sealed now), the state of the in-memory is sealed, so we need to change it to open before write it to disk since the state in shard header should always be open.

however, in emergent gc, the last shard of the emergent chunk is open, not sealed. we can use it directly.

2 if this part

if (valid_blob_indexes.empty()) { LOGDEBUG("empty shard found in move_from_chunk, chunk_id={}, shard_id={}", move_from_chunk, shard_id); // TODO::send a delete shard request to raft channel. there is a case that when we are doing gc, the // shard becomes empty, need to handle this case // we should always write a shard header for the last shard of emergent gc. if (!is_emergent || !is_last_shard) continue; }

for all empty shard(not valid blobs in this shard), we can skip handle it. but if it is the last shard of emergent chunk for emergent gc, even if it is empty , we should write a shard header for it. it is the case that the last shard in a chunk only has a shard header and trigger emergent gc.

for a sealed shard, the state in its shard header blk is open, and the state in its shard footer blk is sealed.. now we always generate the shard metablk by shard in-memory metadata , not reading it from disk since we do not maintain the pba of shard header and footer in indexsvc. as a result , for a sealed shard, after we build the shard metablk in memory(the in-memory state is sealed now), the state of the in-memory is sealed, so we need to change it to open before write it to disk since the state in shard header should always be open.

I doubt we need to do that , actually the state of shard header is not reliable (and maybe we should move the state out of it). As you point out we generate the header based on metablk so the BR/Incremental resync already set the header::open to false.

yes, I am not also quite sure when we will use it (for metrics or scrubbing?), so I try to write the shard header as it used to be.

How about lets make the change and see ?

what changes do you want to make? I can try

1 move the shard state out of shard header blk?
or
2 when data copy , we do not change the state to open in shard header blk?

for myself, I tend to keep the copied shard header the same as its source, which means the state in shard header blk should always be open.

I would propose to try 1 and then 2 is the gift.

actually, for shard header blk, we only write once and never read it (we do not know where it is, we don`t record it in index table), so the content(including shard state) in it does not matter, right?

if we want to remove it out of shard header blk , we need to adjust the allocated memory size carefully, since we allocate buffer size according to the size of shard_info_superblk, which contains the size of shard state.

sisl::io_blob_safe sb_blob(sisl::round_up(sizeof(shard_info_superblk), repl_dev->get_blk_size()), io_align); shard_info_superblk* sb = new (sb_blob.bytes()) shard_info_superblk(); sb->type = DataHeader::data_type_t::SHARD_INFO; sb->info = ShardInfo{.id = new_shard_id, .placement_group = pg_owner, .state = ShardInfo::State::OPEN, .lsn = 0, .created_time = create_time, .last_modified_time = create_time, .available_capacity_bytes = size_bytes, .total_capacity_bytes = size_bytes};

what about I comments this part of code, so that we do not care about the shard state in shard header blk when data copy. after data copy, when don`t know where it is and never read it

if (!is_last_shard_in_emergent_chunk) { // for the sealed shard, the shard state in header should also be open.now, the written header is the same // as footer except the shard state, so we lost the original header. r_cast< HSHomeObject::shard_info_superblk* >(header_sgs.iovs[0].iov_base)->info.state = ShardInfo::State::OPEN; // TODO:: get the original header from the move_from_chunk and change the following part if needed. /* uint64_t created_time; uint64_t last_modified_time; uint64_t available_capacity_bytes; uint64_t total_capacity_bytes; */ }

xiaoxichen · 2025-06-03T05:28:19Z

src/lib/homestore_backend/gc_manager.cpp

+
+    LOGDEBUG("all valid blobs are copied from move_from_chunk={} to move_to_chunk={}", move_from_chunk, move_to_chunk);
+
+    // we need to commit_blk for the move_to_chunk to make sure the last offset of append_blk_allocator is updated.


If we refactor this function a bit

lets sync write header via async_alloc_write().get().

parallel submit for all data blobs , returns PBA from the async_alloc_write().thenValue().

wait all data write finish , get the max PBA

(except last shard) sync write footer.

This function would be more readable as well as we can get the PBA we need to commit.

I make this fully aysnc and if any op fails, we fails the whole copy_shard. so no need to check all kinds of the failed cases.
is it more simple to commit to the last blk? I perfer this way.

in the last shard in emergent chunk, if all the blobs of this shard are written concurrently, we can not make sure which blk is last written. you mean compare the blk one by one?

yes a for loop serves the purpose.
making an async function complicate to 100+ lines then follow a get() is not that "async" . Personally I would prefer just async those needed part(data copy) and leave other part (hearder/footer) sync as I failed to follow the complicate then-value structure.

But anyway , it is your choice.

JacksonYao287 force-pushed the gc-data-copy branch 3 times, most recently from 98b208d to c3cc258 Compare May 28, 2025 13:12

shosseinimotlagh reviewed May 29, 2025

View reviewed changes

JacksonYao287 force-pushed the gc-data-copy branch 5 times, most recently from 5a8cdd5 to 16c6a02 Compare June 2, 2025 05:33

JacksonYao287 requested review from yamingk, xiaoxichen and sanebay June 3, 2025 00:27

implement gc data copy

718f081

JacksonYao287 force-pushed the gc-data-copy branch from 16c6a02 to 718f081 Compare June 3, 2025 03:17

xiaoxichen reviewed Jun 3, 2025

View reviewed changes

fix

02f24d3

JacksonYao287 force-pushed the gc-data-copy branch 2 times, most recently from bc3abf3 to 02f24d3 Compare June 5, 2025 10:27

JacksonYao287 added 2 commits June 5, 2025 05:54

fix

62b2c8b

fix

3bed784

xiaoxichen requested review from Besroy and yuwmao June 6, 2025 10:26


		LOGDEBUG("all valid blobs are copied from move_from_chunk={} to move_to_chunk={}", move_from_chunk, move_to_chunk);

		// we need to commit_blk for the move_to_chunk to make sure the last offset of append_blk_allocator is updated.

implement gc data copy #303

Are you sure you want to change the base?

implement gc data copy #303

Conversation

JacksonYao287 commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JacksonYao287 commented May 27, 2025 •

edited

Loading

codecov-commenter commented May 27, 2025 •

edited

Loading