Skip to content

Commit e935f99

Browse files
authored
Define Linux Network Devices (#1271)
The proposed "netdevices" field provides a declarative way to specify which host network devices should be moved into a container's network namespace. This approach is similar than the existing "devices" field used for block devices but uses a dictionary keyed by the interface name instead. The proposed scheme is based on the existing representation of network device by the `struct net_device` https://docs.kernel.org/networking/netdevices.html. This proposal focuses solely on moving existing network devices into the container namespace. It does not cover the complexities of network configuration or network interface creation, emphasizing the separation of device management and network configuration. Signed-off-by: Antonio Ojea <[email protected]>
1 parent ea38318 commit e935f99

File tree

10 files changed

+188
-0
lines changed

10 files changed

+188
-0
lines changed

config-linux.md

+105
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,107 @@ In addition to any devices configured with this setting, the runtime MUST also s
189189
* [`/dev/ptmx`][pts.4].
190190
A [bind-mount or symlink of the container's `/dev/pts/ptmx`][devpts].
191191

192+
## <a name="configLinuxNetworkDevices" />Network Devices
193+
194+
Linux network devices are entities that send and receive data packets. They are
195+
not represented as files in the `/dev` directory. Instead, they are represented
196+
by the [`net_device`][net_device] data structure in the Linux kernel. Network
197+
devices can belong to only one network namespace and use a set of operations
198+
distinct from regular file operations. Network devices can be categorized as
199+
**physical** or **virtual**:
200+
201+
* **Physical network devices** correspond to hardware interfaces, such as
202+
Ethernet cards (e.g., `eth0`, `enp0s3`). They are directly associated with
203+
physical network hardware.
204+
* **Virtual network devices** are software-defined interfaces, such as loopback
205+
devices (`lo`), virtual Ethernet pairs (`veth`), bridges (`br0`), VLANs, and
206+
MACVLANs. They are created and managed by the kernel and do not correspond
207+
to physical hardware.
208+
209+
This schema focuses solely on moving existing network devices identified by name
210+
from the host network namespace into the container network namespace. It does
211+
not cover the complexities of network device creation or network configuration,
212+
such as IP address assignment, routing, and DNS setup.
213+
214+
**`netDevices`** (object, OPTIONAL) - A set of network devices that MUST be made
215+
available in the container. The runtime is responsible for moving these devices;
216+
the underlying mechanism is implementation-defined.
217+
218+
The name of the network device is the entry key. Entry values are objects with
219+
the following properties:
220+
221+
* **`name`** *(string, OPTIONAL)* - the name of the network device inside the
222+
container namespace. If not specified, the host name is used.
223+
224+
The runtime MUST check if moving the network interface to the container
225+
namespace is possible. If a network device with the specified name already
226+
exists in the container namespace, the runtime MUST [generate an error](runtime.md#errors),
227+
unless the user has provided a template by appending
228+
`%d` to the new name. In that case, the runtime MUST allow the move, and the
229+
kernel will generate a unique name for the interface within the container's
230+
network namespace.
231+
232+
The runtime MUST preserve existing network interface attributes, including all
233+
permanent IP addresses (IFA_F_PERMANENT flag) of any family with global scope
234+
(RT_SCOPE_UNIVERSE value) as defined in [`RFC 3549 Section 2.3.3.2`][rfc3549].
235+
This ensures that only addresses intended for persistent, external communication
236+
are transferred.
237+
238+
The runtime MUST set the network device state to "up" after moving it to the
239+
network namespace to allow the container to send and receive network traffic
240+
through that device.
241+
242+
### Namespace Lifecycle and Container Termination
243+
244+
The runtime MUST NOT actively manage the interface's lifecycle and configuration
245+
*within* the container's network namespace. This is because network interfaces
246+
are inherently tied to the network namespace itself, and their lifecycle is
247+
therefore managed by the owner of the network namespace. Typically, this
248+
ownership and management are handled by higher-level container runtime
249+
orchestrators, rather than the processes running directly within the container.
250+
251+
The runtime **MUST NOT** attempt to move the interface out of the namespace
252+
before deletion. This design decision is based on the following:
253+
254+
* **Namespace Ownership:** Network interfaces are tied to the network namespace,
255+
which may not always be directly managed by the runtime.
256+
* **Abrupt Termination:** Even when the runtime manages the namespace, it cannot
257+
reliably participate in its deletion if the container's processes terminate
258+
abruptly (e.g., due to a crash) or run until completion.
259+
260+
During the network namespace deletion the kernel's built-in namespace cleanup
261+
mechanisms take over, as described in [network_namespaces(7)][net_namespaces.7]:
262+
"When a network namespace is freed (i.e., when the last process in the namespace
263+
terminates), its physical network devices are moved back to the initial network
264+
namespace." All the network namespace migratable physical network devices are
265+
moved to the default network namespace, while virtual devices (veth, macvlan,
266+
...) are destroyed.
267+
268+
If users require custom handling of interface lifecycle during namespace
269+
deletion, they can utilize existing features within the namespace orchestrator
270+
or employ post-stop hooks.
271+
272+
**Physical Interface Renaming and Systemd**
273+
274+
When a physical interface is renamed within a container and the container's
275+
network namespace is later deleted, the kernel will move the interface back to
276+
the root namespace with its renamed name. In case of a name conflict in the root
277+
namespace, the kernel will rename it to `dev%d`. To ensure predictable interface
278+
names in the root namespace, users can utilize systemd's `udevd` and `networkd`
279+
rules. Refer to [systemd Predictable Network Interface Names][predictable-network-interfaces-names]
280+
for more information on configuring predictable names.
281+
282+
### Example
283+
284+
#### Moving a device with a renamed interface inside the container:
285+
286+
```json
287+
"netDevices": {
288+
"eth0" : {
289+
"name": "container_eth0"
290+
}
291+
}
292+
192293
## <a name="configLinuxControlGroups" />Control groups
193294

194295
Also known as cgroups, they are used to restrict resource usage for a container and handle device access.
@@ -975,6 +1076,10 @@ subset of the available options.
9751076
[mknod.1]: https://man7.org/linux/man-pages/man1/mknod.1.html
9761077
[mknod.2]: https://man7.org/linux/man-pages/man2/mknod.2.html
9771078
[namespaces.7_2]: https://man7.org/linux/man-pages/man7/namespaces.7.html
1079+
[net_device]: https://docs.kernel.org/networking/netdevices.html
1080+
[net_namespaces.7]: https://man7.org/linux/man-pages/man7/network_namespaces.7.html
1081+
[predictable-network-interfaces-names]: https://systemd.io/PREDICTABLE_INTERFACE_NAMES
1082+
[rfc3549]: https://www.ietf.org/rfc/rfc3549.txt
9781083
[null.4]: https://man7.org/linux/man-pages/man4/null.4.html
9791084
[personality.2]: https://man7.org/linux/man-pages/man2/personality.2.html
9801085
[pts.4]: https://man7.org/linux/man-pages/man4/pts.4.html

features-linux.md

+14
Original file line numberDiff line numberDiff line change
@@ -228,3 +228,17 @@ Irrelevant to the availability of Intel RDT on the host operating system.
228228
}
229229
}
230230
```
231+
232+
## <a name="linuxFeaturesNetDevices" />NetDevices
233+
234+
**`netDevices`** (object, OPTIONAL) represents the runtime's implementation status of Linux network devices.
235+
236+
* **`enabled`** (bool, OPTIONAL) represents whether the runtime supports the capability to move Linux network devices into the container's network namespace.
237+
238+
### Example
239+
240+
```json
241+
"netDevices": {
242+
"enabled": true
243+
}
244+
```

schema/config-linux.json

+6
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@
99
"$ref": "defs-linux.json#/definitions/Device"
1010
}
1111
},
12+
"netDevices": {
13+
"type": "object",
14+
"additionalProperties": {
15+
"$ref": "defs-linux.json#/definitions/NetDevice"
16+
}
17+
},
1218
"uidMappings": {
1319
"type": "array",
1420
"items": {

schema/defs-linux.json

+8
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,14 @@
189189
}
190190
}
191191
},
192+
"NetDevice": {
193+
"type": "object",
194+
"properties": {
195+
"name": {
196+
"type": "string"
197+
}
198+
}
199+
},
192200
"weight": {
193201
"$ref": "defs.json#/definitions/uint16"
194202
},

schema/features-linux.json

+8
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,14 @@
110110
}
111111
}
112112
}
113+
},
114+
"netDevices": {
115+
"type": "object",
116+
"properties": {
117+
"enabled": {
118+
"type": "boolean"
119+
}
120+
}
113121
}
114122
}
115123
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
{
2+
"ociVersion": "1.0.0",
3+
"root": {
4+
"path": "rootfs"
5+
},
6+
"linux": {
7+
"netDevices": {
8+
"eth0": {
9+
"name": 23
10+
}
11+
}
12+
}
13+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"ociVersion": "1.0.0",
3+
"root": {
4+
"path": "rootfs"
5+
},
6+
"linux": {
7+
"netDevices": {
8+
"eth0": {
9+
"name": "container_eth0"
10+
},
11+
"ens4": {},
12+
"ens5": {}
13+
}
14+
}
15+
}

schema/test/features/good/runc.json

+3
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,9 @@
182182
},
183183
"selinux": {
184184
"enabled": true
185+
},
186+
"netDevices": {
187+
"enabled": true
185188
}
186189
},
187190
"annotations": {

specs-go/config.go

+8
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,8 @@ type Linux struct {
236236
Namespaces []LinuxNamespace `json:"namespaces,omitempty"`
237237
// Devices are a list of device nodes that are created for the container
238238
Devices []LinuxDevice `json:"devices,omitempty"`
239+
// NetDevices are key-value pairs, keyed by network device name on the host, moved to the container's network namespace.
240+
NetDevices map[string]LinuxNetDevice `json:"netDevices,omitempty"`
239241
// Seccomp specifies the seccomp security settings for the container.
240242
Seccomp *LinuxSeccomp `json:"seccomp,omitempty"`
241243
// RootfsPropagation is the rootfs mount propagation mode for the container.
@@ -491,6 +493,12 @@ type LinuxDevice struct {
491493
GID *uint32 `json:"gid,omitempty"`
492494
}
493495

496+
// LinuxNetDevice represents a single network device to be added to the container's network namespace
497+
type LinuxNetDevice struct {
498+
// Name of the device in the container namespace
499+
Name string `json:"name,omitempty"`
500+
}
501+
494502
// LinuxDeviceCgroup represents a device rule for the devices specified to
495503
// the device controller
496504
type LinuxDeviceCgroup struct {

specs-go/features/features.go

+8
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ type Linux struct {
4848
Selinux *Selinux `json:"selinux,omitempty"`
4949
IntelRdt *IntelRdt `json:"intelRdt,omitempty"`
5050
MountExtensions *MountExtensions `json:"mountExtensions,omitempty"`
51+
NetDevices *NetDevices `json:"netDevices,omitempty"`
5152
}
5253

5354
// Cgroup represents the "cgroup" field.
@@ -143,3 +144,10 @@ type IDMap struct {
143144
// Nil value means "unknown", not "false".
144145
Enabled *bool `json:"enabled,omitempty"`
145146
}
147+
148+
// NetDevices represents the "netDevices" field.
149+
type NetDevices struct {
150+
// Enabled is true if network devices support is compiled in.
151+
// Nil value means "unknown", not "false".
152+
Enabled *bool `json:"enabled,omitempty"`
153+
}

0 commit comments

Comments
 (0)