Skip to content

[PD] Abort request if transfer fails #6504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 22, 2025
Merged

Conversation

ByronHsu
Copy link
Collaborator

Motivation

The current transfer failure handling is a placeholder. The correct way to do it is

  1. Raise the real exception from sender/receiver in failure_exception()
  2. When a failure status is polled, abort the request immediately and log the error

In additional,
I added a guardrail in scheduler to abort requests immediately if the req does not have bootstrap_id (usually because not sent from LB)

Modifications

Checklist

Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Will finish the failure_exception part of mooncake based on this PR tomorrow.

@ByronHsu ByronHsu force-pushed the byron/abort-if-transfer-failed branch from dc9f173 to b775fce Compare May 21, 2025 23:09
@zhyncs zhyncs merged commit 3bde101 into main May 22, 2025
9 of 38 checks passed
@zhyncs zhyncs deleted the byron/abort-if-transfer-failed branch May 22, 2025 04:44
lifuhuang pushed a commit to lifuhuang/sglang that referenced this pull request May 23, 2025
xutianyi1999 pushed a commit to gh-efforts/sglang that referenced this pull request May 23, 2025
xutianyi1999 pushed a commit to gh-efforts/sglang that referenced this pull request May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants