A production-ready health check library for Go applications that enables proper monitoring and graceful degradation in modern cloud environments, especially Kubernetes.
Health checks are critical for building resilient, self-healing applications in distributed systems. They provide:
- Automatic Recovery: In Kubernetes, failed health checks trigger automatic pod restarts, ensuring your application recovers from transient failures without manual intervention.
- Load Balancer Integration: Health checks prevent traffic from being routed to unhealthy instances, maintaining service quality even during partial outages.
- Graceful Degradation: By monitoring dependencies (databases, caches, external APIs), your application can degrade gracefully when non-critical services fail.
- Operational Visibility: Health endpoints provide instant insight into system state, making debugging and incident response faster.
- Zero-Downtime Deployments: Readiness checks ensure new deployments only receive traffic when fully initialized.
- Multiple Check Types: Basic (sync), Manual, and Background (async) checks for different use cases
- Kubernetes Native: Built-in
/live
and/ready
endpoints following k8s conventions - JSON Status Reports: Detailed health status with history for debugging
- Metrics Integration: Callbacks for Prometheus or other monitoring systems
- Thread-Safe: Concurrent-safe operations with proper synchronization
- Graceful Shutdown: Proper cleanup of background checks
- Check History: Last 5 states stored for each check for debugging
go get -u github.com/kazhuravlev/healthcheck
package main
import (
"context"
"errors"
"math/rand"
"time"
"github.com/kazhuravlev/healthcheck"
)
func main() {
ctx := context.TODO()
// 1. Create healthcheck instance
hc, _ := healthcheck.New()
// 2. Register a simple check
hc.Register(ctx, healthcheck.NewBasic("redis", time.Second, func(ctx context.Context) error {
if rand.Float64() > 0.5 {
return errors.New("service is not available")
}
return nil
}))
// 3. Start HTTP server
server, _ := healthcheck.NewServer(hc, healthcheck.WithPort(8080))
_ = server.Run(ctx)
// 4. Check health at http://localhost:8080/ready
select {}
}
Basic checks run on-demand when the /ready
endpoint is called. Use these for:
- Fast operations (< 1 second)
- Checks that need fresh data
- Low-cost operations
// Database connectivity check
dbCheck := healthcheck.NewBasic("postgres", time.Second, func (ctx context.Context) error {
return db.PingContext(ctx)
})
Background checks run periodically in a separate goroutine (in background mode). Use these for:
- Expensive operations (API calls, complex queries)
- Checks with rate limits (when checks running rarely than k8s requests to
/ready
) - Operations that can use slightly stale data
// External API health check - runs every 30 seconds
apiCheck := healthcheck.NewBackground(
"payment-api",
nil, // initial error state
5*time.Second, // initial delay
30*time.Second, // check interval
5*time.Second, // timeout per check
func (ctx context.Context) error {
resp, err := client.Get("https://api.payment.com/health")
if err != nil {
return err
}
defer resp.Body.Close()
if resp.StatusCode != 200 {
return errors.New("unhealthy")
}
return nil
},
)
Manual checks are controlled by your application logic. Use these for:
- Initialization states (cache warming, data loading)
- Circuit breaker patterns
- Feature flags
// Cache warming check
cacheCheck := healthcheck.NewManual("cache-warmed")
hc.Register(ctx, cacheCheck)
// Set unhealthy during startup
cacheCheck.SetErr(errors.New("cache warming in progress"))
// After cache is warmed
cacheCheck.SetErr(nil)
Scenario | Check Type | Why |
---|---|---|
Database ping | Basic | Fast, needs fresh data |
File system check | Basic | Fast, local operation |
External API health | Background | Expensive, rate-limited |
Message queue depth | Background | Metrics query, can be stale |
Cache warmup status | Manual | Application-controlled state |
// ❌ Bad: Too long timeout blocks readiness. Timeout should less than timeout in k8s
healthcheck.NewBasic("db", 30*time.Second, checkFunc)
// ✅ Good: Short timeout
healthcheck.NewBasic("db", 1*time.Second, checkFunc)
-
Liveness (
/live
): Should almost always return 200 OK- Only fail if the application is in an unrecoverable state
- Kubernetes will restart the pod on failure
-
Readiness (
/ready
): Should fail when:- Critical dependencies are unavailable
- Application is still initializing
- Application is shutting down
func checkDatabase(ctx context.Context) error {
if err := db.PingContext(ctx); err != nil {
// Use fmt.Errorf to add context. It will be available in /ready report
return fmt.Errorf("postgres connection failed: %w", err)
}
return nil
}
hc, _ := healthcheck.New(
healthcheck.WithCheckStatusHook(func (name string, status healthcheck.Status) {
// hcMetric can be a prometheus metric - it is up to your infrastructure
hcMetric.WithLabelValues(name, string(status)).Set(1)
}),
)
package main
import (
"context"
"database/sql"
"fmt"
"log"
"time"
"github.com/kazhuravlev/healthcheck"
_ "github.com/lib/pq"
)
func main() {
ctx := context.Background()
// Initialize dependencies
db, err := sql.Open("postgres", "postgres://localhost/myapp")
if err != nil {
log.Fatal(err)
}
// Create healthcheck
hc, _ := healthcheck.New()
// 1. Database check - synchronous, critical
hc.Register(ctx, healthcheck.NewBasic("postgres", time.Second, func(ctx context.Context) error {
return db.PingContext(ctx)
}))
// 2. Cache warmup - manual control
cacheReady := healthcheck.NewManual("cache")
hc.Register(ctx, cacheReady)
cacheReady.SetErr(fmt.Errorf("warming up"))
// 3. External API - background check
hc.Register(ctx, healthcheck.NewBackground(
"payment-provider",
nil,
10*time.Second, // initial delay
30*time.Second, // check interval
5*time.Second, // timeout
checkPaymentProvider,
))
// Start health check server
server, _ := healthcheck.NewServer(hc, healthcheck.WithPort(8080))
if err := server.Run(ctx); err != nil {
log.Fatal(err)
}
// Simulate cache warmup completion
go func() {
time.Sleep(5 * time.Second)
cacheReady.SetErr(nil)
log.Println("Cache warmed up")
}()
log.Println("Health checks available at:")
log.Println(" - http://localhost:8080/live")
log.Println(" - http://localhost:8080/ready")
select {}
}
func checkPaymentProvider(ctx context.Context) error {
// Implementation of payment provider check
return nil
}
apiVersion: v1
kind: Pod
spec:
containers:
- name: app
livenessProbe:
httpGet:
path: /live
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
The /ready
endpoint returns detailed JSON with check history:
{
"status": "up",
"checks": [
{
"name": "postgres",
"state": {
"status": "up",
"error": "",
"timestamp": "2024-01-15T10:30:00Z"
},
"history": [
{
"status": "up",
"error": "",
"timestamp": "2024-01-15T10:29:55Z"
}
]
}
]
}