Skip to content

Fix CPU spike caused by hanging connection after socket closure (#8806) #8810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 27, 2025
Merged

Fix CPU spike caused by hanging connection after socket closure (#8806) #8810

merged 1 commit into from
Jan 27, 2025

Conversation

goktug97
Copy link
Contributor

After the RerunServer shuts down, the broadcast_thread_func in crates/store/re_ws_comms/src/server.rs could not terminate due to the log_rx.recv function entering an infinite loop while attempting to receive data.

To resolve this, recv has been replaced with recv_timeout, allowing the thread to properly shut down and preventing excessive CPU usage.

Related

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Thanks for opening this pull request.

Because this is your first time contributing to this repository, make sure you've read our Contributor Guide and Code of Conduct.

Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution! However, I don't think that's the correct fix: Instead of making broadcast_thread_func a "slow busyloop", stop_listener should wakeup the broadcast thread by hanging up the ReceiveSet after setting the shutdown flag. ReceiveSet::recv method is documented to return if the set is empty, so calling clear on it after setting the shutdown flag should hopefully do it.

Edit: clear won't work, that just gets stuck in a lock. Needs some more prodding on how a ReceiveSet can be woken up, might be that we need to send a poisoned message of sort

@Wumpf
Copy link
Member

Wumpf commented Jan 27, 2025

log_rx.recv function entering an infinite loop while attempting to receive data.

That actually also sounds a bug in ReceiveSet then - I'd expect it at most to block indefinitely rather than busy-looping

@goktug97
Copy link
Contributor Author

goktug97 commented Jan 27, 2025

Makes sense, I checked the ReceiveSet and issue is happening because is_empty() check is done on Vec<Receiver>, not on the Receiver itself. Will revert the previous changes and push the fix.

EDIT: On second look, that might not be the desired behavior

@goktug97
Copy link
Contributor Author

The issue was due to these lines;

if let Ok(msg) = oper.recv(&rx[index].rx) {
    return Ok(msg);
}

It starts returning RecvError after disconnecting but that branch is not handled so loop enters infinite loop. Latest push solves the problem and serve_web works expected.

@goktug97 goktug97 requested a review from Wumpf January 27, 2025 10:19
Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch, thank you so much. I agree, that makes more sense now

@Wumpf Wumpf added 🪳 bug Something isn't working 🕸️ web regarding running the viewer in a browser 🪵 Log & send APIs Affects the user-facing API for all languages include in changelog labels Jan 27, 2025
After the `RerunServer` shuts down, the `broadcast_thread_func` in
`crates/store/re_ws_comms/src/server.rs#342` could not terminate due to
the `log_rx.recv` (ReceiveSet) function entering an infinite loop while
attempting to receive data.
@Wumpf
Copy link
Member

Wumpf commented Jan 27, 2025

Hmm strange, not seeing a report from the Rust CI. Maybe something is slightly broken in the gh workflows for non-maintainer contributions. Ah well, main will run it as well and I'm feeling confident enough

@Wumpf Wumpf merged commit 3fb2ac5 into rerun-io:main Jan 27, 2025
5 checks passed
@Wumpf Wumpf removed the 🪵 Log & send APIs Affects the user-facing API for all languages label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working include in changelog 🕸️ web regarding running the viewer in a browser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

serve_web followed by disconnect outside of main thread causes 100% CPU usage
2 participants