Skip to content

[wip] add k8s integration design doc #346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ishidawataru
Copy link

Signed-off-by: Wataru Ishida [email protected]

@qiluo-msft
Copy link
Contributor

  1. support switching between standalone mode and cluser mode in runtime

    You mentioned there are 2 modes. Are they mutually exclusive? Can we have them at the same time?

  2. support cluster joining mechanism for newly added switch

    Either a switch could identify its cluster master, or a cluster master could discover or identify all the switches. Do you have a design?

  3. How much disk space/CPU/RAM of switch is needed?

  4. Can we have a terminology explanation for SONiC use case? Such as pod, cluster, service, deployment, etc.

  5. What is the scalability of single master cluster? How to manage them if we must use multiple masters?

  6. Do you assume all the switches and the master in a cluster are in one layer 2 network?

  7. What is the process for switch to upgrade its whole image?

  8. What is the process for master to upgrade its k3s package?

  9. What is the process for switch to upgrade its k3s package/docker?

  10. before joining to the cluster, we need to stop the containers which are currently running on the node since the master will start deploying the same containers in this node

    This is very bad for swss, syncd, teamd, bgp dockers. Can we relax it?

  11. In long future, we may add new docker container into switch image. Could master manages totally different switch images in a cluster?

Signed-off-by: Wataru Ishida <[email protected]>
@ishidawataru
Copy link
Author

@qiluo-msft Thanks for the comment. I added a glossary to the doc.

  1. You mentioned there are 2 modes. Are they mutually exclusive? Can we have them at the same time?

It should be mutually exclusive. In standalone mode, a switch itself has the control of the containers which run on it. In cluster mode, the k8s controller has the control of it.

  1. Either a switch could identify its cluster master, or a cluster master could discover or identify all the switches. Do you have a design?

The joining procedure needs to be invoked by a switch. So a switch should identify its cluster master, get the token, and ask the master to join the cluster.
This could be done in some ZTP ( or Ansible? ) procedure.

  1. How much disk space/CPU/RAM of switch is needed?

Added in the document. Not sure about CPU usage.
In my experience, CPU usage never became a problem.

  1. Can we have a terminology explanation for SONiC use case? Such as pod, cluster, service, deployment, etc.

Added in the document.

  1. What is the scalability of single master cluster? How to manage them if we must use multiple masters?

The official document says it can scale up to 5000 nodes. However, I think this number really depends on the environment. Also k3s is using sqlite3 as the default internal DB instead of etcd which should also affect the performance.

  1. Do you assume all the switches and the master in a cluster are in one layer 2 network?

No. k8s master only needs IP reachability to control nodes.

  1. What is the process for switch to upgrade its whole image?

The easiest way would be unjoining the node from the cluster and join again after the upgrade.

  1. What is the process for master to upgrade its k3s package?

T.B.D. I'll investigate what k3s is offering.
Does SONiC have a mechanism to upgrade docker?

  1. What is the process for switch to upgrade its k3s package/docker?

T.B.D. I'll investigate what k3s is offering.

  1. This is very bad for swss, syncd, teamd, bgp dockers. Can we relax it?

Can't we use warm reboot for the transition as we did at the hackathon?

  1. In long future, we may add new docker container into switch image. Could master manages totally different switch images in a cluster?

Yes, as I described, this can be supported by using selector and labels.

Signed-off-by: Wataru Ishida <[email protected]>
@lguohan
Copy link
Contributor

lguohan commented Mar 28, 2019

before upgrading container, the controller may need to do some actions, such as take bgp snapshot, drain traffic from the switch. after upgrade container, the controller may need to do some post upgrade actions, such as comparing the snapshot, restore traffic.

any consideration for such actions supported by k8s?


- Cluster
- A set of machines, called nodes, that run containerized applications managed by Kubernetes
- In SONiC use-case, each machine is SONiC switch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement looks weird. Do you mean cluster?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cluster is a set of machines and each machine is a SONiC switch (except controller node).
How should I change the statement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you mean 'In SONiC use-case, a node is a SONiC switch. And a cluster is all the SONiC switches managed by Kubernetes'.

@ishidawataru
Copy link
Author

ishidawataru commented Apr 1, 2019

@lguohan

We can use postStart and preStop hook to invoke such actions

Also, we can use init-containers to do some tasks before kubelet starts containers

In k8s, this kind of application specific operations can be implemented as an operator

@yxieca yxieca force-pushed the master branch 2 times, most recently from 8498931 to 8837dc2 Compare April 15, 2022 16:51
@MikeZappa87
Copy link

Is this dead?

@ishidawataru ishidawataru deleted the k8s-integration branch March 28, 2023 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants