-
Notifications
You must be signed in to change notification settings - Fork 76
Consider changing the vhost-user-backend API #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@mtjhrc sorry for my late reply. I think this work will be great! I was looking to support macOS/BSD and our dependency on Linux specific stuff like epoll/eventfd is the biggest thing to be solved. Your work may simplify that.
I think it's fine, but we should help our user (vhost-device, virtiofsd, etc.) to adapt easily their code.
Sure, no objection on my side. |
I think being able to unify out-of-process and in-process device implementations alone is already worth the effort. 👍 |
I don't have a macOS machine, but this is definitely something I am keeping in mind. |
FYI I'll talk about it at FOSDEM next year: https://fosdem.org/2025/schedule/event/fosdem-2025-5100-can-qemu-and-vhost-user-devices-be-used-on-macos-and-bsd-/ I can mention your work ;-) |
FWIW: libepoll-shim can be used on BSD and macOS. |
@alyssais Thanks, I was thinking also on https://github.com/smol-rs/polling |
I have a working proof of concept of the new API with a vhost-user console (ported from libkrun) - this uses multiple worker threads, and the gpu device from vhost-device. Regarding polling, for now I am still using the epoll and EventFd from vmm-sys-util, libkrun has a a shim that implements it on macOS: https://github.com/containers/libkrun/tree/main/src/utils/src/macos. Regarding smol-rs/polling - this seems to use kqueue's In the PoC, I am using a layer built on top vmm-sys-util Epoll called EventManager (this is different than the current EventManager in rust-vmm, which makes me think I should rename this). This is based on crosvm implementation and allows you to identify the events using a rust enum. (You can still use an int if you want though, and you can even use ints inside an enum). #[derive(EventToken, Debug, Copy, Clone)]
enum WorkerToken {
ControlRxq,
ControlTxq,
// You can also have an event like this (specify queue index inside):
// Queue(u32),
Quit,
}
fn run(mut self) -> anyhow::Result<()> {
let event_manager: EventManager<WorkerToken> = EventManager::new().unwrap();
event_manager.add(&self.control_rxq.kick_event(), WorkerToken::ControlRxq)?;
event_manager.add(&self.control_txq.kick_event(), WorkerToken::ControlTxq)?;
event_manager.add(&self.stopfd, WorkerToken::Quit).unwrap();
// You can also just add any file descriptor using add_fd instead of add, or
// implement the EventSource trait for your struct and use the add() method
// like above
loop {
for event in events_manager.wait()? {
match event.token() {
WorkerToken::ControlRxq => self.process_rx(),
WorkerToken::ControlTxq => self.process_tx(),
WorkerToken::Quit => return Ok(()),
}
}
}
} I'm not sure how to proceed now, though. I feel like the first step, if we want to go through with this, would be to merge vm-virtio and vhost repository? I'm not sure if we want a different repository for devices (both vhost-user and in-process), or put these in the same repository too? After that I could create a draft PR? |
FYI we are going to have a GSoC student working on that in order to support vhost related crates on macOS/BSD: So, if possible I'd wait for the student and maybe help her/him to do it.
Thanks for sharing! Is it okay for you to wait for the student to work on this or is it too late?
I think we've already discussed this in private, but I think it's orthogonal, right?
It would be nice to have a repository with all the devices implemented as libraries so that they can be consumed both in-vmm and in vhost-user. This perhaps needs to be discussed with vhost-device maintainers as well, though.
mmm, for the new repository, I would say no, we are already fragmented enough. In my vision maybe we should have:
I might have missed something though, does that makes sense for you or were you thinking something different? |
Oh, I missed that, but to clarify, I haven't really worked nor I have plans to work on *BSD/macOS support. I have only read the
No I'm not getting confused with the crates/repositories. It's just that reviewing and discussing big PR's that depend on each other spanning 3 repositories (vhost-device, vhost and vm-virtio) doesn't seem efficient (because we want to merge at least some of the repositories anyway) - so why not try to merge the repositories first? Let me describe how I imagine this working to give you a better idea (well the PoC already works like this): Btw the The The
Yes, this is exactly what I want to achieve.
I would prefer to put the device libraries and vhost-device executables in the same repository, because they would be quite coupled and this would simplify development and testing of new features.
Yeah it can happen later, but like I already mentioned, why not do it first? The
Kind of, the threading/event model would already be defined by the Hopefully the explanation of how this would work makes some sense at least. |
Okay, I was confused you mentioning (maybe in a private discussion) to replicate some trait/struct in different repo. Also I still think vhost-device should stay separated. They are binaries and we saw we had a loooot of dependencies to keeps updated, so IMHO is better to have them in a different repo, but again we should disccuss with vhost-device maintainers.
For example for vhost-device-sound the biggest part is the supports for the various backends (pipewire, alsa, gstreamer one day, etc.), but these are completely separate from the virtio-sound device emulation, and they carry a lot of dependencies, so it might make sense to keep them separate.
Because it will be so time-consuming. Also, the advantage of having the separate things is that I think sometime it's better to have separate PRs, so they can be reviewed in pieces and in a more humane way. As a maintainer seeing a PR coming in, that touches 5 crates scares me to review. I understand that for a contributor it's the opposite, but in this case I'd advise you not to do a mega PR that changes everything, but more PRs that do it gradually if possible (which would be forced by separate repositories), otherwise we risk no one looking at it. Having said that, I'm not against having one repository, in fact I think it has advantages even on the reviewer side, so if you want to try it for me that's fine, but I'm afraid it's not a quick process.
Yes, thanks, now it's much clearer! |
I feel like you still don't fully understand what I mean: Of course we could also have libraries for implementing only generic parts of the devices, but not full devices (like we already have with
Yes I get that, and I am also worried about doing a mega PR for this exact reason, but here's the problem though: So to summarize, I can split it like this:
|
Yes, I didn't understand, but I don't think this is the right path.
I think this is the right way, because a VMM could be interested only in the device and not in the backends (here with backends I mean for example UDS/hybrid-vsock for virtio-vsock, or pipewire for virtio-sound, or tcp for virtio-console, etc).
We've done things like this before, yes I understand the problem, but opening a draft PR on another repo to show how the changes will be used is not that dramatic. That said, again, I'm not against having one repo, I'm just saying it's not going to be a quick thing.
Ack |
I'm not sure what you mean by "hybrid" here, I'm not very familiar with the vsock in rust-vmm.
Actually this was relevant 😃 , though possibly it's not needed now, since libkrun has a better way to forward pipewire over virtio-gpu/cross-domain. (But libkrun currently still contains a forked sound device from
Sure but this is orthogonal to what I am proposing. In general how generic the devices-as-libraries are over their specific back-ends would depend on the device maintainer, and what consumers of the device need (like libkrun) and in general how practical it is to do on case-by-case basis. I don't currently plan to do changes here. In general, I'm not doing this to gain a specific device in libkrun currently (I could just port the device, if we needed something now), but to be able to use devices in the future easily, possibly still vendor them or use them as crates. But even if we need to vendor them for some reason, this would greatly decrease divergence. Currently I'm also working on 2 diverging GPU devices one in |
I don't think it's orthogonal, but simply an extra step on the library direction, however it's fine, as a first step there's no problem. I still don't understand how you're going to consume the device implemented in vhost-device-vsock in libkrun to implement TSI, but maybe I'll figure it out later. |
Yeah, that's what I meant.
I don't plan on doing that for now. My current goal isn't to use everything from rust-vmm upstream. But in the future, who knows... |
Background
Hi, I have been working on a
vhost-user-gpu
with @dorindabassey and we ran into some complications.I think there are some things about how
VhostUserBackend
is designed and generally the API model that are not great for the GPU device, and in general for other devices too - personally I am also familiar withvirtio-net
andvirtio-console
devices because I worked on them inlibkrun
. The goal would be to make writing new devices and mainting existing ones easier by making the API more flexible.Current API can be clunky to use in some cases:
Currently the
vhost-user-backend
handles spawning worker threads for the device, and calls thehandle_event
callback on that thread. I don't think this model is good, because it leads to some difficulties when programming.Here are the main issues:
1. Where to store the data of a worker thread
Consider you have a worker thread that wants to have some variables only available to this single thread, this is currently difficult to do. Basically you have a couple of options:
Mutex
/RwLock
- for example virtio-sound does this. I feel like this adds unnecessary complexity, and unless I am missing something, we could get rid of some of the extra locking (by using theGuestMemoryAtomic
we already have).!Send
- suchRutabaga
/virglrenderer
context in thevhost-user-gpu
, your only option is to lazy initialize the struct (putting something!Send
inside aMutex
doesn't help), and usethread_local!
or useunsafe
to work around this.2. Adding file descriptors for polling after starting the backend
vhost-user-gpu
we needed to add a file descriptor to be signaled byhandle_event
insidehandle_event
this is very akward to do. If you want to do this you firt need to pass the epoll handler created by the daemon to the device backend. Inhandle_event
you now have access to the epoll, but but you have to be careful, because you cannot use theVhostUserBackendMut
trait and the convenient lock around the whole struct, because ifhandle_event
is called while the lock is held, you would deadlock yourself whennum_queues
is called here:vhost/vhost-user-backend/src/event_loop.rs
Line 117 in 176f44a
3.
VhostUserBackend
vsVhostUserBackendMut
In general the split
VhostUserBackend
andVhostUserBackendMut
traits doesn't seem to me like a good design choice. I feel like when you specify that you want 1 worker thread and you use theVhostUserBackendMut
trait it mostly works fine (except the!Send
problem), but if you want 2 worker threads or something more complex, you have to change over toVhostUserBackend
and use manual locking. Another problem is that if you don't switch to using theVhostUserBackend
trait you would have 2 worker threads, but thehandle_event
could never run at the same time because it is behind a lock leading to a non-obvious performance bug.More flexible event handling API
My main idea is to just let the device spawn it's own worker threads and use epoll directly in them instead of having the
handle_event
callback. This would basically change the event system from "push" to "poll". I think this would be a lot more flexible, because it would allow:One of the problems with the current implementation, is that since
handle_event
is a method and a callback the type systems basically forcesself
to beSend
+Sync
, which in practise forces you to lock everything, which could be avoided.Consider having devices also be usable as in-process devices by VMMs
Since I am proposing changing the API it could be beneficial to make a bit more breaking changes and also allow consuming the devices as in-process devices. Actually we aren't that far from this, but this would of course require some more thoughs about the abstractions.
This could be useful for libkrun and other VMMs (anyone else interested?)
Questions
I can create a more detailed proposals of APIs, but first I wanted to see:
The text was updated successfully, but these errors were encountered: