Consider changing the vhost-user-backend API

# Background
Hi, I have been working on a [`vhost-user-gpu`](https://github.com/rust-vmm/vhost-device/pull/668) with @dorindabassey and we ran into some complications.
I think there are some things about how `VhostUserBackend` is designed and generally the API model that are not great for the GPU device, and in general for other devices too - personally I am also familiar with `virtio-net` and `virtio-console` devices because I worked on them in [`libkrun`](https://github.com/containers/libkrun). The goal would be to make writing new devices and mainting existing ones easier by making the API more flexible. 

# Current API can be clunky to use in some cases:
Currently the `vhost-user-backend` handles spawning worker threads for the device, and calls the `handle_event` callback on that thread. I don't think this model is good, because it leads to some difficulties when programming.
Here are the main issues:

## 1. Where to store the data of a worker thread

Consider you have a worker thread that wants to have some variables only available to this single thread, this is currently difficult to do. Basically you have a couple of options:
- Store the data inside the Backend struct and wrap in a a `Mutex`/`RwLock` - for example [virtio-sound does this](https://github.com/rust-vmm/vhost-device/blob/391e32b82f8a8923380e6b4772c487f302defe5c/vhost-device-sound/src/device.rs#L487). I feel like this adds unnecessary complexity, and unless I am missing something, we could get rid of some of the extra locking (by using the `GuestMemoryAtomic` we already have). 
- If the thread specific data is `!Send` - such `Rutabaga`/`virglrenderer` context in the `vhost-user-gpu`, your only option is to [lazy initialize the struct](https://github.com/rust-vmm/vhost-device/blob/391e32b82f8a8923380e6b4772c487f302defe5c/staging/vhost-device-gpu/src/device.rs#L540-L589) (putting something `!Send` inside a `Mutex` doesn't help), and use `thread_local!` or use `unsafe` to work around this.

## 2. Adding file descriptors for polling after starting the backend
- In `vhost-user-gpu` we needed to add a file descriptor to be signaled by `handle_event` inside `handle_event` this is very akward to do. If you want to do this you firt need to [pass the epoll handler created by the daemon](https://github.com/rust-vmm/vhost-device/blob/391e32b82f8a8923380e6b4772c487f302defe5c/staging/vhost-device-gpu/src/device.rs#L1489-L1490
) to the device backend. In `handle_event` you now have access to the epoll, but but you have to be careful, because you cannot use the `VhostUserBackendMut` trait and the convenient lock around the whole struct, because if `handle_event` is called while the lock is held, you would deadlock yourself when `num_queues` is called here: https://github.com/rust-vmm/vhost/blob/176f44a2568d7489b6c0d40fb9f1c698b0660922/vhost-user-backend/src/event_loop.rs#L117 

## 3. `VhostUserBackend` vs `VhostUserBackendMut`
In general the split `VhostUserBackend` and `VhostUserBackendMut` traits doesn't seem to me like a good design choice. I feel like when you specify that you want 1 worker thread and you use the `VhostUserBackendMut` trait it mostly works fine (except the `!Send` problem), but if you want 2 worker threads or something more complex, you have to change over to `VhostUserBackend` and use manual locking. Another problem is that if you don't switch to using the `VhostUserBackend` trait you would have 2 worker threads, but the `handle_event` could never run at the same time because it is behind a lock leading to a non-obvious performance bug.

# More flexible event handling API
My main idea is to just let the device spawn it's own worker threads and use epoll directly in them instead of having the `handle_event` callback. This would basically change the event system from "push" to "poll". I think this would be a lot more flexible, because it would allow:
1. any custom thread <-> vring mapping
2. adding and removing file descriptors to watch easily
3. having detailed control of which thread has access to what leading to more understandable and efficient code

One of the problems with the current implementation, is that since `handle_event` is a method and a callback the type systems basically forces `self` to be `Send` + `Sync`, which in practise forces you to lock everything, which could be avoided.

```Rust
struct FooDevice { ... };

fn worker(mem: GuestMemoryAtomic<GuestMemoryMmap>, vring: VringRwLock, exit_event: EventFd) {
    // Epoll abstraction details are not that important for this proposal, this could change
    Poller poller;
    let vring_ready_event = vring.ready_event();
    poller.add(vring_ready_event);
    poller.add(exit_event);
    
    loop {
        let event = poller.poll();

        if event == exit_event {
            return
        }

        if event == vring_ready_event {
            // handle the vring event here
            // you can even add/remove event file descriptors to the epoll
        }
    }     
}

impl VhostUserBackend for FooDevice {
    
    fn num_queues(&self) -> usize { ... }
    fn max_queue_size(&self) -> usize { .. }
    fn features(&self) -> u64 { ... }
    fn protocol_features(&self) -> VhostUserProtocolFeatures { ... }
    // Other methods like before, except the `queues_per_thread(&self)` which would be removed


    // The entry point where we start the device, this would require the device starting at least 1 worker thread, but we do that today anyway so it shouldn't be an issue.
    // (The exact mem, and vring types should be more generic, this is just an example) 
    fn start(&mut self, vrings: Box<[VringRwLock]>, quit_event: EventFd) -> Result<()> {
        // Notice the thread doesn't have to have a reference to `self` 
// if you need to shared state between the worker threads, you can use something like Arc<Mutex<YourSharedDataStruct>>, or even a channel.
        thread::spawn(move || worker(self.mem.clone(), vrings[0], quit_event));
        Ok(())
    }
}  
```

# Consider having devices also be usable as in-process devices by VMMs
Since I am proposing changing the API it could be beneficial to make a bit more breaking changes and also allow consuming the devices as in-process devices. Actually we aren't that far from this, but this would of course require some more thoughs about the abstractions. 
This could be useful for [libkrun](https://github.com/containers/libkrun) and other VMMs (anyone else interested?) 

# Questions
I can create a more detailed proposals of APIs, but first I wanted to see:
1. Is there interest in doing smaller breaking changes to the API like I proposed?
2. Is there interest in doing bigger API changes to also allow consuming the devices as in-process devices?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider changing the vhost-user-backend API #279

Background

Current API can be clunky to use in some cases:

1. Where to store the data of a worker thread

2. Adding file descriptors for polling after starting the backend

3. `VhostUserBackend` vs `VhostUserBackendMut`

More flexible event handling API

Consider having devices also be usable as in-process devices by VMMs

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider changing the vhost-user-backend API #279

Description

Background

Current API can be clunky to use in some cases:

1. Where to store the data of a worker thread

2. Adding file descriptors for polling after starting the backend

3. VhostUserBackend vs VhostUserBackendMut

More flexible event handling API

Consider having devices also be usable as in-process devices by VMMs

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3. `VhostUserBackend` vs `VhostUserBackendMut`