Skip to content

[doc]: Init Smartswitch database High Level Design #1534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 24, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# Smart Switch Database design #

## Table of Content

- [Smart Switch Database design](#smart-switch-database-design)
- [Table of Content](#table-of-content)
- [Revision](#revision)
- [Scope](#scope)
- [Definitions/Abbreviations](#definitionsabbreviations)
- [Overview](#overview)
- [Requirements](#requirements)
- [Architecture Design](#architecture-design)
- [Database services](#database-services)
- [Database flow](#database-flow)
- [High-Level Design](#high-level-design)
- [SAI API](#sai-api)
- [Configuration and management](#configuration-and-management)
- [CLI/YANG model Enhancements](#cliyang-model-enhancements)
- [Config DB Enhancements](#config-db-enhancements)
- [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact)
- [Memory Consumption](#memory-consumption)
- [DPU\_APPL\_DB](#dpu_appl_db)
- [DPU\_APPL\_STATE\_DB/DPU\_STATE\_DB](#dpu_appl_state_dbdpu_state_db)
- [Restrictions/Limitations](#restrictionslimitations)
- [Testing Requirements/Design](#testing-requirementsdesign)
- [Unit Test cases](#unit-test-cases)
- [System Test cases](#system-test-cases)
- [Open/Action items - if any](#openaction-items---if-any)

### Revision

| Rev | Date | Author | Change Description |
| :---: | :---: | :----: | -------------------------------- |
| 0.1 | | Ze Gan | Initial version. Database design |

### Scope

This document provides a high-level design for Smart Switch database.

### Definitions/Abbreviations

| Term | Meaning |
| ---- | --------------------------------- |
| NPU | Network Processing Unit |
| DPU | Data Processing Unit |
| DB | Database |
| GNMI | gRPC Network Management Interface |

### Overview

The Smart Switch comprises two integral components: the Network Processing Unit (NPU) and the Data Processing Unit (DPU), both operating on the SONiC OS. The database stack encompasses the entire database infrastructure for both the NPU and DPU. However, due to memory limitations on the DPU, certain overlayer objects, such as DASH objects, are stored in the NPU.
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overlayer objects

What is definition of "overlayer objects"? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add it in the Definitions/Abbreviations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is overlay objects. Please fix all places in doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


In addition, dedicated database containers are maintained in the NPU for each DPU, serving the purpose of resource management within the Smart Switch architecture. This separation allows for efficient handling of database-related operations and ensures optimal utilization of resources across the entire Smart Switch.


### Requirements

- All databases, including those on both NPU and DPU, must be accessible through the GNMI server.
- Each DPU database instance on the NPU is associated with a unique TCP port and domain Unix socket path.
- All DPU database instances on the NPU will be bound to the IP address of the midplane bridge.
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

midplane bridge

What is the definition of "midplane bridge"? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add it in the Definitions/Abbreviations

- All database instances on the NPU share the same network namespace to facilitate seamless communication.
- DPUs can access their respective overlay database instances using the IP of the midplane bridge and a pre-assigned unique TCP port.
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overlay database

What is "overlay database"? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to store the overlay objects. The overlay objects has been defined in the section: Definitions/Abbreviations


### Architecture Design

![smart-switch-database-architecture](smart-switch-database-architecture.png)

#### Database services

In this section, the focus is on illustrating the maintenance of DPU overlay databases within the NPU. It's essential to note that the traditional database services of both NPU and DPU remain unchanged and do not necessitate further design modifications.

The management of DPU overlay databases within the NPU is orchestrated through existing SONiC database services. The daemon, named "featured," retains the responsibility for initiating, terminating, enabling, and disabling the DPU overlay database services. This interaction is facilitated using the systemctl tool.

To determine the DPU number, the "featured" daemon should leverage the platform API. However, for the sake of implementation simplicity, the DPU number is extracted directly from the platform_env.conf file firstly.

``` shell
cat /usr/share/sonic/device/$PLATFORM/platform_env.conf
NUM_DPU=2
```

To align with the established multi-ASIC design in SONiC, a new field, `"has_per_dpu_scope": "True"``, is introduced in the database feature table within config_db.json. This field plays a crucial role in ensuring that each DPU database instance is initiated within a dedicated database container. This design approach maintains consistency with SONiC's existing architecture while accommodating the specific requirements of DPU overlay databases.
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensuring

Could you explain the meaning of has_per_dpu_scope? and how does it ensure "each DPU database instance is initiated within a dedicated database container" ? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same concept as the has_per_asic_scope. This field will be read by the featured to start database service container instances for each DPU.


``` json
# config_db.json

"database": {
"auto_restart": "always_enabled",
"delayed": "False",
"has_global_scope": "True",
"has_per_asic_scope": "True",
"has_per_dpu_scope": "True", # New field for DPU database service
"high_mem_alert": "disabled",
"state": "always_enabled",
"support_syslog_rate_limit": "true"
},
```

Within the NPU, the management of DPU overlay databases involves specific configurations. Each DPU overlay database instance is bound to the IP address of the midplane bridge (169.254.200.254 by default). The TCP port assignment follows a predictable pattern, with each DPU ID associated with a unique port (6380 + DPU ID). Additionally, the Unix domain socket path is structured as /var/run/redisdpu{DPU_ID}.
Copy link
Contributor

@qiluo-msft qiluo-msft Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/var/run/redisdpu{DPU_ID}

This rule should be defined in runtime config file database_global.json. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Update a paragraph to describe the database_global.json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var/run/redisdpu{DPU_ID} -> var/run/redisdpu{database_name}.
Maybe you should remove this part, because it is part of database_global.json

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Here is an example includes two DPU:

``` json
# DPU0: /var/run/redisdpu2/sonic-db/database_config.json
"redis": {
"hostname": "169.254.200.254",
"port": 6381,
"unix_socket_path": "/var/run/redisdpu0/redis.sock",
"persistence_for_warm_boot": "yes",
"database_type": "dpudb"
}
#DPU1: /var/run/redisdpu1/sonic-db/database_config.json
"redis": {
"hostname": "169.254.200.254",
"port": 6382,
"unix_socket_path": "/var/run/redisdpu1/redis.sock",
"persistence_for_warm_boot": "yes",
"database_type": "dpudb"
}
```

There are four new tables introduction for the DPU overlay database:

``` json
"DPU_APPL_DB": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prsunny , do you mind to help double confirm the DB name change? I think Ze is proposing modifying all DASH_* dbs to DPU_* dbs. Will any extra change needed in swss?

Also, if this is expected, we will also need to update all dash docs to make us align too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @r12f for tagging. Yes this should be fine as it is new DB instances. Objects in the DB shall still be with DASH_* prefixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second thought, i'm thinking why to define a with prefix DPU_*? Since this is a separate container for a specific DPU, say DPU0, why not we just keep it APPL_DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that using the prefix DPU_* is clearer and more extensible. Here's why:

  1. Similar to DPU_STATE_DB: We want to organize all similar objects into a single instance. This improves clarity and maintainability. Even if the memory footprint of DPU_STATE_DB is minimal.
  2. Future-proof: If we decide to move underlayer objects of DPU to NPU in the future, this naming convention will maintain isolation and avoid breaking functionality.

Do you have any concerns with defining these new entities?

"id": 15,
"separator": ":",
"instance": "redis",
"format": "proto"
},
"DPU_APPL_STATE_DB": {
"id": 16,
"separator": "|",
"instance": "redis"
},
"DPU_STATE_DB": {
"id": 17,
"separator": "|",
"instance": "redis"
},
"DPU_COUNTERS_DB": {
"id": 18,
"separator": ":",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should counters also uses |?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case ipv6 stuffs showing up and complicates the splitting logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is just a SONiC convention. https://github.com/sonic-net/sonic-buildimage/blob/ada7c6a72e27a9729628f383f48cd5f75d4da227/dockers/docker-database/database_config.json.j2#L32
I'm worried that there is some hard-code separator : in the existing code base.

But you are right, there is some conflict with ipv6. This may happen in APPL_DB also.
@qiluo-msft do you know why we use the colon as the separator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as per design for some DBs. With IPv6, we ensure IPv6 is always at the end of key so as to not conflict with expected Seperator

"instance": "redis"
}
```

#### Database flow

This section outlines critical workflows interacting with the DPU overlay database.

- Update Overlay Objects via GNMI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be better to change these into headers or indent the content of each section, so it shows better inside of a single list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Communication with the SWSS of the DPU occurs through GNMI, leveraging ZMQ. Simultaneously, an asynchronous insertion of the object backup is made to the DPU_APPL_DB. This backup mechanism serves purposes such as debugging, migration, and future considerations.

- Update Object Status:

The SWSS of the DPU takes a proactive role in updating the DPU_APPL_STATE_DB and DPU_STATE_DB when corresponding objects undergo updates. This update can be triggered either by GNMI message commands or internal service logic.

- Update Counters and Meters:

Flex counter management in Syncd of the DPU handles the update of counters and meters for overlay objects. Traditional counters are also managed through this mechanism.

These workflows ensure an interaction between the DPU overlay database and various components within the Smart Switch. The DPUs access their respective database instances via the IP address of the midplane bridge and the assigned TCP port. Concurrently, GNMI accesses these instances through the Unix domain socket

### High-Level Design

### SAI API

N/A

### Configuration and management

No new CLI commands are required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to mention that protobuf support is added as part of CLI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#### CLI/YANG model Enhancements


``` yang
container sonic-feature {
container FEATURE {
leaf has_per_dpu_scope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this feature used or expected? just to confirm, since I am not aware that we have another mode.

also, for objects inside of db, it will be better to add a link to maybe dash api proto repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just follows the multi-asic logic.
IMO, This field does explicitly tell users that corresponding database services will be started for each DPU. Otherwise SONiC will only start a global database service.
Without this new field, we have to add some specific logic for DPU and database service. I don't think it's flexible enough.
sonic-net/sonic-host-services#84

description "This configuration identicates there will only one service
spawned per DPU";
type feature-scope-status;
default "false";
}
}
}
```

#### Config DB Enhancements

Refer section: [Database services](#database-services)


### Warmboot and Fastboot Design Impact

N/A

### Memory Consumption

#### DPU_APPL_DB

The estimated memory consumption for the Smart Switch database is calculated based on entry sizes sourced from the [sonic-dash-api](https://github.com/sonic-net/sonic-dash-api) repository and entry numbers derived from the [DASH high-level design scaling requirements](https://github.com/sonic-net/DASH/blob/main/documentation/general/dash-sonic-hld.md#14-scaling-requirements).

The following tables comprises two parts: Global tables and per ENI tables. Notably, when calculating the total size per card, the memory consumption of per ENI tables is adjusted by multiplying it by the exact number of ENIs.

- Global Tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be better to use headings instead of list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


| Table name | Entry size (bytes) | No. of entries in the Table | Total size per card (KB) |
| ----------------------- | ------------------ | --------------------------- | ------------------------ |
| DASH_VNET_TABLE | 448 | 1,024 | 448 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the size of tables might change overtime, so should we put a specific commit id to make the reference more explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

| DASH_ENI_TABLE | 208 | 64 | 13 |
| DASH_PREFIX_TAG(IPv6) | 229,492 | 32 | 7,172 |
| DASH_VNET_MAPPING_TABLE | 216 | 10,000,000 | 2,109,375 |

- Per ENI Tables

| Table name | Entry size (bytes) | No. of entries in the Table per ENI | Total size per card (KB) |
| ------------------------------ | ------------------ | ----------------------------------- | ------------------------ |
| DASH_ACL_RULE_TABLE(IPv6) | 2,488 | 10,000 | 1,555,000 |
| DASH_ROUTE_RULE_TABLE(inbound) | 176 | 10,000 | 110,000 |
| DASH_ROUTE_TABLE(outbound) | 264 | 100,000 | 1,650,000 |

Based on the provided data and calculations, the estimated memory consumption for the DPU_APPL_DB is approximately **5.18GB** per card.

#### DPU_APPL_STATE_DB/DPU_STATE_DB

For the DPU_APPL_STATE_DB and DPU_STATE_DB, the storage focus is specifically on retaining the keys and its status of each object rather than storing the metadata. This results in a reduced memory footprint compared to the DPU_APPL_DB. The estimated memory consumption for these databases is approximately 2.45GB.

- Global Tables

| Table name | Entry size (bytes) | No. of entries in the Table | Total size per card (KB) |
| ----------------------- | ------------------ | --------------------------- | ------------------------ |
| DASH_VNET_TABLE | 88 | 1,024 | 88 |
| DASH_ENI_TABLE | 144 | 64 | 7 |
| DASH_PREFIX_TAG(IPv6) | 104 | 32 | 3 |
| DASH_VNET_MAPPING_TABLE | 144 | 10,000,000 | 1,406,250 |

- Per ENI Tables

| Table name | Entry size (bytes) | No. of entries in the Table per ENI | Total size per card (KB) |
| ------------------------------ | ------------------ | ----------------------------------- | ------------------------ |
| DASH_ACL_RULE_TABLE(IPv6) | 104 | 10,000 | 65,000 |
| DASH_ROUTE_RULE_TABLE(inbound) | 160 | 10,000 | 100,000 |
| DASH_ROUTE_TABLE(outbound) | 160 | 100,000 | 1,000,000 |

### Restrictions/Limitations

### Testing Requirements/Design

#### Unit Test cases

No separate test for the is required. The feature will be tested implicitly by the other DASH tests.

#### System Test cases

No separate test for the is required. The feature will be tested implicitly by the other DASH tests.

### Open/Action items - if any

1. Platform API for fetch DPU numbers