Description
Description
This task focuses on improving the LitmusChaos documentation by structuring and creating tutorials into Day 0, Day 1, and Day 2 workflows tailored for different users. Instead of documenting individual faults (which would require constant maintenance), the goal is to create user-flow-based guides that help users understand chaos engineering principles at different levels of expertise, from beginners experimenting with sample apps to advance users implementing chaos in real-world systems.
Additionally, this task will involve tech doc improvements, fixing structural issues, removing duplicates, and ensuring a clear and intuitive documentation experience for the community.
Prerequisites:
- Strong technical writing and research skills.
- Ability to understand user personas (SREs, Principal Engineers, Developers, etc.).
- Familiarity with chaos engineering principles (experience with LitmusChaos is a plus).
- Basic knowledge of Kubernetes and observability tools (Grafana, Prometheus, etc.).
Schedule: 3rd March, 2025 - 30th May, 2025
Previous Works & References:
- Existing LitmusChaos Documentation: LitmusChaos Docs
- GitHub Issues for Documentation Improvements: LitmusChaos Docs Repo
- Example Tutorials in the Current Docs: LitmusChaos Tutorials
What You Will Do:
- Develop Day 0, Day 1, and Day 2 Tutorials for LitmusChaos
-
Day 0 (Beginner-Level Chaos Engineering) [Already implemented, we can improve it further]
Goal: Introduce users to chaos engineering with a simple application.
Application: Podtato Head, Online Boutique, or another microservices demo app.
Experiment: Simulate pod deletion and observe recovery through Kubernetes deployment strategies.
Outcome: Understand basic failure scenarios and how Kubernetes ensures resilience. -
Day 1 (Intermediate-Level Chaos Engineering)
Goal: Introduce chaos into real-world applications with stateful components.
Application: Redis, Cassandra, or MongoDB.
Experiment:- Simulate leader pod crashes to test leader-election mechanisms.
- Perform network partitioning to evaluate how replicas handle failures.
Outcome: Learn how distributed databases and services handle failures.
-
Day 2 (Advanced Chaos Workflows & Multi-Experiment Scenarios)
Goal: Create a comprehensive chaos engineering workflow from start to finish.
Scenario: A complex chaos workflow covering multiple failure scenarios.
Experiments:- Pod delete → CPU spike → Network latency → Validate system recovery metrics in Grafana.
- Extend this to multi-cluster failure scenarios for advanced users.
Outcome: Understand system-wide resilience patterns and how to build automated chaos workflows.
-
Research Chaos Experiment Needs for Different Personas
Identify use cases for different users (SREs, Platform Engineers, Principal Engineers).
Determine the right type of experiments and use case tutorials for the group. -
Improve Documentation Structure and Fix Issues
Work on fixing tech docs analysis open issues (structure changes, removing duplicates, improving clarity).
Enhance navigation and make tutorials easier to follow.
Mentors
This task is ideal for those passionate about developer experience, documentation, and chaos engineering education. The tutorials created will serve as long-term learning resources for new and experienced LitmusChaos users!