Skip to content

Fix possibly failing to reserve slots due to stale issued number #1870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 14, 2025

Conversation

Sushisource
Copy link
Member

What was changed

Fix a bug where resource based slot supplier was looking at a stale issued slots number while trying to reserve

Why?

Could cause it to be stuck failing to reserve

Checklist

  1. Closes ResourceBasedSlotSupplier.ReleaseSlot not implemented #1868

  2. How was this tested:
    Added test

  3. Any docs updates needed?

@Sushisource Sushisource requested a review from a team as a code owner March 14, 2025 17:34
@@ -346,7 +346,7 @@ func (s slotReserveInfoImpl) WorkerIdentity() string {
}

func (s slotReserveInfoImpl) NumIssuedSlots() int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document that NumIssuedSlots is the current number of slots issues and will get updated over time? I can see users getting confused thinking this is the number of reserved slots at the time of the call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@Quinn-With-Two-Ns
Copy link
Contributor

What is the reason for using the SlotInfo count instead of having ResourceBasedSlotSupplier keep track? It should be fine but it is hard to reason about since the SlotInfo count is updated outside the mutex of the ResourceBasedSlotSupplier?

@Quinn-With-Two-Ns
Copy link
Contributor

Also is there a reason this isn't implemented if max slots is provided? https://github.com/temporalio/sdk-go/blob/master/contrib/resourcetuner/resourcetuner.go#L205

@Sushisource
Copy link
Member Author

What is the reason for using the SlotInfo count instead of having ResourceBasedSlotSupplier keep track? It should be fine but it is hard to reason about since the SlotInfo count is updated outside the mutex of the ResourceBasedSlotSupplier?

Because internally we always need to know the issued count, so the tracking supplier does that so it's not duplicated in all implementations. The mutex in the resource based supplier is purely for the last issued time / PID decisions.

@Sushisource
Copy link
Member Author

Also is there a reason this isn't implemented if max slots is provided? https://github.com/temporalio/sdk-go/blob/master/contrib/resourcetuner/resourcetuner.go#L205

It's debatable. I left it unimplemented because it maps to the "available slots" metric, but, you can easily set a max here that isn't actually attainable in any real sense.

@Quinn-With-Two-Ns
Copy link
Contributor

Because internally we always need to know the issued count

To confrim we only update our internal count after the slot supplier ReleaseSlot returns right?

@Sushisource
Copy link
Member Author

Because internally we always need to know the issued count

To confrim we only update our internal count after the slot supplier ReleaseSlot returns right?

Yup, exactly

@Sushisource Sushisource enabled auto-merge (squash) March 14, 2025 18:46
@Sushisource Sushisource merged commit 2dcb1b9 into master Mar 14, 2025
15 checks passed
@Sushisource Sushisource deleted the fix-stuck-reservations branch March 14, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ResourceBasedSlotSupplier.ReleaseSlot not implemented
2 participants