Skip to content

Make chip fault tolerant to losing connectivity to Redis/Valkey #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Copilot
Copy link

@Copilot Copilot AI commented Jun 12, 2025

This PR makes the chip service resilient to Redis/Valkey connectivity issues, allowing it to continue operating even when the Redis backend becomes unavailable.

Changes Made

1. Optional Redis Connection at Startup

  • Modified cmd/chipserver/main.go to continue startup even if Redis connection fails
  • Instead of terminating the application, logs a warning and continues with Redis pool set to nil

2. Graceful Redis Operation Handling

  • Added WithRedisConn helper method to runtime.Runtime that:
    • Checks Redis availability before attempting operations
    • Logs appropriate messages when Redis is unavailable
    • Handles Redis operation failures gracefully without crashing

3. Fault-Tolerant Service Methods

Updated all Redis-dependent service methods to use graceful error handling:

  • StartChat() - continues without setting ready state in Redis
  • ConfirmDelivery() - continues without recording sent status
  • CloseChat() - continues without unsetting ready state
  • QueueMsgOut() - continues without queuing to Redis
  • send() - skips sending when Redis unavailable

Behavior Changes

When Redis is Available: No functional changes - operates normally

When Redis is Unavailable:

  • ✅ Application starts successfully (logs warning)
  • ✅ Chat sessions can be started and managed
  • ✅ Messages are delivered directly to connected WebSocket clients
  • ✅ Service continues running without crashes
  • ❌ Message queuing/persistence is disabled (graceful degradation)

When Redis Recovers: Service automatically resumes Redis operations

Testing

Added unit tests to verify Redis fault tolerance behavior works correctly when the Redis pool is nil or operations fail.

This change ensures chip remains operational during Redis outages, providing degraded but functional service until Redis connectivity is restored.

Fixes #21.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@Copilot Copilot AI changed the title [WIP] Make fault tolerant to losing connectivity to redis/valkey Make chip fault tolerant to losing connectivity to Redis/Valkey Jun 12, 2025
@Copilot Copilot AI requested a review from rowanseymour June 12, 2025 16:03
Copilot finished work on behalf of rowanseymour June 12, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make fault tolerant to losing connectivity to redis/valkey
2 participants