-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[smartswitch] Update IP assignment flow #1586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
@Pterosaur can you add description? |
I agree that we need to change the flow on the DPU side to configure the midplane interface before the DB container is running. However, I see no benefits in changing the flow for the NPU side to have a midplane bridge created with no interfaces before the DB starts. It doesn't affect the overall IP address assignment flow as it still requires a DHCP server container to start. Why do we need to introduce a new service on the NPU? |
Hi @oleksandrivantsiv , There is no hard requirement for this changing in NPU, but it will simplify the service graph of NPU. As you know, the database service is a very early booting process that is required by other services, like SWSS GNMI, and so on. In the NPU, a predictable scenario is the GNMI needs to W/R the DPU database instances in the NPU. So, these database services need to be started before GNMI. And there may be the same scenario for SWSS and other components. If we follow the current design, the midplane bridge is created and configured at interface.service, we have to redesign the service dependencies graph like, |
0c1b582
to
8b42fd3
Compare
… for NPU Signed-off-by: Ze Gan <[email protected]>
8b42fd3
to
b4d7707
Compare
Signed-off-by: Ze Gan <[email protected]>
3. Add new tests to cover the handling of MID_PLANE_BRIDGE and DPUS tables in sonic-cfggen utility. | ||
4. Add new tests to cover DHCP server configuration generation for the Smart Switch in sonic-cfggen utility. | ||
5. Extend interface-config.service tests to verify midplane bridge configuration generation. | ||
1. Extend existing YANG model tests to cover MGMT_INTERFACE table changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please review this test case also? There are no MGMT_INTERFACE changes required anymore.
"DHCP_SERVER_IPV4": { | ||
"bridge_midplane": { | ||
"bridge-midplane": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the bridge here doesn't match the name in platform.json
midplane_network table:
"midplane_bridge" vs "bridge-midplane"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: Ze Gan <[email protected]>
5aca26d
to
8cd0d3d
Compare
@@ -149,14 +175,8 @@ Based on the preset `t1-smartswitch` default topology the configuration generate | |||
"subtype": "SmartSwitch", | |||
} | |||
}, | |||
"MID_PLANE_BRIDGE" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a concern with removing the MID_PLANE_BRIDGE table from the config DB. The DHCP server implementation for the Smart Switch depends on this table. Please check this PR: sonic-net/sonic-buildimage#17338
We still can generate the MID_PLANE_BRIDGE table configuration based on the information in the platform.json file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's strange if we still leave them in the config DB. Because in my current design, we will not support to change the midplane bridge from config DB anymore. If we want to change it, we need to edit the platform.json. A table of Config DB cannot be configured, it's confused.
But I agree to leave it in this PR and I can find folks to update their current implementation if this HLD is approved and merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have midplane bridge information in the config DB gives us a more convenient way to read it. However, the configuration should be treated as a read-only.
….service for NPU" This reverts commit b4d7707.
Signed-off-by: Ze Gan <[email protected]>
4cadc79
to
043929a
Compare
[Unit] | ||
... | ||
Requires=midplane-network.service | ||
After=midplane-network.service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not required to modify the database.service
.
To make sure that midplane-network.service
runs before database.service
it is possible to use the Before
section:
[Unit]
Description=Midplane network service
Requires=systemd-networkd.service
After=systemd-networkd.service
Before=database.service
[Service]
Type=oneshot
User=root
ExecStart=/usr/lib/systemd/systemd-networkd-wait-online -i bridge-midplane
This will allow us to make the midplane-network.service
atomic and disable or remove it without affecting the database.service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv I think both midplane-network and database services can come up independently without any dependency. @Pterosaur Can you explain why we need explicit dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not required to modify the
database.service
. To make sure thatmidplane-network.service
runs beforedatabase.service
it is possible to use theBefore
section:[Unit] Description=Midplane network service Requires=systemd-networkd.service After=systemd-networkd.service Before=database.service [Service] Type=oneshot User=root ExecStart=/usr/lib/systemd/systemd-networkd-wait-online -i bridge-midplane
This will allow us to make the
midplane-network.service
atomic and disable or remove it without affecting thedatabase.service
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oleksandrivantsiv I think both midplane-network and database services can come up independently without any dependency. @Pterosaur Can you explain why we need explicit dependency?
- To the DPU side, I believe I don't need to explain much detail. Because we have some components, including database.service, need to leverage the midplane network IP address to identify its DPU ID.
- To the NPU side, I feel we need to bind the DPU Redis instance on the midplane bridge. If that IP is assigned after the Redis instance, I don't know whether the redis can delay bounding. Even if we suppose the redis support the delay bounding, but based on the current implementation, other components, like swsscommon, cannot support that. https://github.com/sonic-net/sonic-swss-common/blob/253ceb61916fabbf8f261aa9c7507e33d8628312/common/dbconnector.cpp#L536-L550 , So, other critical service may be crashed.
doc/smart-switch/ip-address-assigment/smart-switch-ip-address-assignment.md
Outdated
Show resolved
Hide resolved
Name=eth0-midplane | ||
|
||
[Network] | ||
DHCP=yes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pterosaur are we using SONiC specific DHCP client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to my experiment and a link: https://askubuntu.com/a/1047654/1708404 .I feel the systemd-networkd will use an in-built DHCP client by default.
Do you have any concerns about this behavior? Why do we need to use the SONiC specific DHCP client for midplane network?
doc/smart-switch/ip-address-assigment/smart-switch-ip-address-assignment.md
Show resolved
Hide resolved
Signed-off-by: Ze Gan <[email protected]>
Signed-off-by: Ze Gan <[email protected]>
e4ba1d3
to
11b0514
Compare
…f Smart Switch (#18178) Why I did it Add systemd network service of Smart Switch for configuring the midplane network. How I did it Add related systemd services and networkd configuration According to paltform.json, the systemd-sonic-generator will install required services. The changes are based on the HLD: sonic-net/SONiC#1534 and sonic-net/SONiC#1586 How to verify it Check Azp that no failure to the existing testcases Check NPU scenario: 2.1 Start a KVM and add the NPU platform.json as following echo '{"DPUS":{"dpu0":{}, "dpu1":{}}}' | sudo tee /usr/share/sonic/device/x86_64-kvm_x86_64-r0/platform.json 2.2 You should get two specific database containers for DPU admin@vlab-01:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4455f7ab611c docker-database:latest "/usr/local/bin/dock…" 5 minutes ago Up 5 minutes databasedpu0 8e983b5beb27 docker-database:latest "/usr/local/bin/dock…" 5 minutes ago Up 5 minutes databasedpu1 2.3 Check the expected service midplane-network-npu.service is enabled but the midplane-network-dpu.service is disabled admin@vlab-01:~$ sudo systemctl list-unit-files '*midplane*' UNIT FILE STATE PRESET midplane-network-dpu.service disabled enabled midplane-network-npu.service enabled-runtime enabled 2.4 Check the bridge-midplane has been created. admin@vlab-01:~$ sudo brctl show bridge name bridge id STP enabled interfaces ... bridge-midplane 8000.76c5a51a18f6 no dummy-midplane 2.5 If we create two mock dpu interfaces, they should be added into the bridge-midplane automatically admin@vlab-01:~$ sudo ip link add dpu0 type veth peer dpu1 admin@vlab-01:~$ sudo brctl show bridge name bridge id STP enabled interfaces ... bridge-midplane 8000.76c5a51a18f6 no dpu0 dpu1 dummy-midplane Check DPU scenario: 3.1 Start a KVM and add the DPU platform.json as following echo '{"DPU":{}}' | sudo tee /usr/share/sonic/device/x86_64-kvm_x86_64-r0/platform.json 3.2 Check the expected service midplane-network-dpu.service is enabled but the midplane-network-npu.service is disabled admin@vlab-01:~$ sudo systemctl list-unit-files '*midplane*' UNIT FILE STATE PRESET midplane-network-dpu.service disabled enabled midplane-network-npu.service enabled-runtime enabled 3.3 The database service should be started after(waiting for) midplane-network-dpu.service with a 10mins delay admin@vlab-01:~$ uptime 16:08:21 up 29 min, 1 user, load average: 0.01, 0.08, 0.12 admin@vlab-01:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ... ae5195283cd6 docker-database:latest "/usr/local/bin/dock…" 35 minutes ago Up 18 minutes database 29min-18min≈10mins (Because in the KVM, we don't have an exact eth0-midplane interface and its DHCP provider) Signed-off-by: Ze Gan <[email protected]>
…f Smart Switch (#18178) Why I did it Add systemd network service of Smart Switch for configuring the midplane network. How I did it Add related systemd services and networkd configuration According to paltform.json, the systemd-sonic-generator will install required services. The changes are based on the HLD: sonic-net/SONiC#1534 and sonic-net/SONiC#1586 How to verify it Check Azp that no failure to the existing testcases Check NPU scenario: 2.1 Start a KVM and add the NPU platform.json as following echo '{"DPUS":{"dpu0":{}, "dpu1":{}}}' | sudo tee /usr/share/sonic/device/x86_64-kvm_x86_64-r0/platform.json 2.2 You should get two specific database containers for DPU admin@vlab-01:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4455f7ab611c docker-database:latest "/usr/local/bin/dock…" 5 minutes ago Up 5 minutes databasedpu0 8e983b5beb27 docker-database:latest "/usr/local/bin/dock…" 5 minutes ago Up 5 minutes databasedpu1 2.3 Check the expected service midplane-network-npu.service is enabled but the midplane-network-dpu.service is disabled admin@vlab-01:~$ sudo systemctl list-unit-files '*midplane*' UNIT FILE STATE PRESET midplane-network-dpu.service disabled enabled midplane-network-npu.service enabled-runtime enabled 2.4 Check the bridge-midplane has been created. admin@vlab-01:~$ sudo brctl show bridge name bridge id STP enabled interfaces ... bridge-midplane 8000.76c5a51a18f6 no dummy-midplane 2.5 If we create two mock dpu interfaces, they should be added into the bridge-midplane automatically admin@vlab-01:~$ sudo ip link add dpu0 type veth peer dpu1 admin@vlab-01:~$ sudo brctl show bridge name bridge id STP enabled interfaces ... bridge-midplane 8000.76c5a51a18f6 no dpu0 dpu1 dummy-midplane Check DPU scenario: 3.1 Start a KVM and add the DPU platform.json as following echo '{"DPU":{}}' | sudo tee /usr/share/sonic/device/x86_64-kvm_x86_64-r0/platform.json 3.2 Check the expected service midplane-network-dpu.service is enabled but the midplane-network-npu.service is disabled admin@vlab-01:~$ sudo systemctl list-unit-files '*midplane*' UNIT FILE STATE PRESET midplane-network-dpu.service disabled enabled midplane-network-npu.service enabled-runtime enabled 3.3 The database service should be started after(waiting for) midplane-network-dpu.service with a 10mins delay admin@vlab-01:~$ uptime 16:08:21 up 29 min, 1 user, load average: 0.01, 0.08, 0.12 admin@vlab-01:~$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ... ae5195283cd6 docker-database:latest "/usr/local/bin/dock…" 35 minutes ago Up 18 minutes database 29min-18min≈10mins (Because in the KVM, we don't have an exact eth0-midplane interface and its DHCP provider) Signed-off-by: Ze Gan <[email protected]>
Update the configuration workflow of the midplane network:
database.service
aftermidplane-netowrk.service
.