Skip to content

Add Open telemetry / Prometheus exporter documentation #15690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
197 changes: 197 additions & 0 deletions docs/operator-guides/collecting-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
Airbyte supports two ways to collect metrics - using datadog or open telemetry.
Fill in `METRIC_CLIENT` field in `.env` file to get started!

**Prerequisite:**
In order to get metrics from airbyte we need to deploy a container / pod called metrics-reporter like below
```
airbyte-metrics:
image: airbyte/metrics-reporter:${VERSION}
container_name: airbyte-metrics
environment:
- METRIC_CLIENT=${METRIC_CLIENT}
- OTEL_COLLECTOR_ENDPOINT=${OTEL_COLLECTOR_ENDPOINT}
```


# Open Telemetry

1. In `.env` change `METRIC_CLIENT` to `otel`.
Expand Down Expand Up @@ -53,6 +65,191 @@ OTEL_COLLECTOR_ENDPOINT=<address>
If you started open telemetry collector in the link above, the address should be `http://otel-collector:4317`.
Note the format - unlike the base `.env`, there is no quote in `.env` file under kubernetes.


## Tutorial

Deploy the airbyte metric pod :
```
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-metrics
namespace: {{ .Release.Namespace }}
labels: {{ set . "component" "metrics" | include "labels" | nindent 4 }}
spec:
selector:
matchLabels: {{ set . "component" "metrics" | include "labels" | nindent 6 }}
template:
metadata:
labels: {{ set . "component" "metrics" | include "labels" | nindent 8 }}
spec:
containers:
- name: airbyte-metrics
image: "airbyte/metrics-reporter:latest"
imagePullPolicy: IfNotPresent
env:
- name: AIRBYTE_VERSION
value: latest
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: {{ include "airbyte.database.secret.name" . }}
key: DATABASE_PASSWORD
- name: DATABASE_URL
value: {{ include "airbyte.database.url" . | quote }}
- name: DATABASE_USER
valueFrom:
secretKeyRef:
name: {{ .Release.Name }}-secrets
key: DATABASE_USER
- name: CONFIGS_DATABASE_MINIMUM_FLYWAY_MIGRATION_VERSION
value: 0.35.15.001
- name: METRIC_CLIENT
value: otel
- name: OTEL_COLLECTOR_ENDPOINT
value: http://otel-collector:4317
```


Deploy an Open telemetry pod like below :

```
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: {{ .Release.Namespace }}
labels: {{ set . "component" "otel-collector" | include "labels" | nindent 4 }}
spec:
selector:
matchLabels: {{ set . "component" "otel-collector" | include "labels" | nindent 6 }}
replicas: 1
template:
metadata:
labels: {{ set . "component" "otel-collector" | include "labels" | nindent 8 }}
spec:
containers:
- command:
- "/otelcol"
- "--config=/conf/otel-collector-config.yaml"
image: "otel/opentelemetry-collector:latest"
name: otel-collector
ports:
- containerPort: 4317 # Default endpoint for OpenTelemetry receiver.
- containerPort: 8889 # Port for Prometheus instance
volumeMounts:
- name: config
mountPath: /conf
volumes:
- configMap:
name: otel-collector-conf
items:
- key: otel-collector-config
path: otel-collector-config.yaml
name: config
```

WIth this Config Map :

```
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-conf
namespace: {{ .Release.Namespace }}
labels: {{ set . "component" "otel-collector" | include "labels" | nindent 4 }}
data:
otel-collector-config: |
receivers:
otlp:
protocols:
grpc:
endpoint: "0.0.0.0:4317"
processors:
batch:
memory_limiter:
limit_mib: 1500
spike_limit_mib: 512
check_interval: 5s
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
namespace: "default"
service:
pipelines:
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
```

Then we need a service to be able to access both open telemetry GRPC and Prometheus
```
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: {{ .Release.Namespace }}
labels: {{ set . "component" "otel-collector" | include "labels" | nindent 4 }}
spec:
ports:
- name: otlp-grpc # Default endpoint for OpenTelemetry gRPC receiver.
port: 4317
protocol: TCP
targetPort: 4317
- name: prometheus
port: 8889
selector: {{ set . "component" "otel-collector" | include "labels" | nindent 6 }}
```

And finally We can add a service monitor to receive metrics in prometheus and optionally add some prometheus rules to generate alerts.
You can replace with your prometheus name.
```
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: {{ .Release.Name }}
namespace: {{ .Release.Namespace }}
labels:
{{ set . "component" "metrics" | include "labels" | nindent 4 }}
prometheus: <your_prometheus_name>
spec:
endpoints:
- interval: 30s
port: prometheus
path: /metrics
relabelings:
- action: labeldrop
regex: (service|endpoint|namespace|container)
selector:
matchLabels: {{ set . "component" "otel-collector" | include "labels" | nindent 6 }}
```

One rule example :
```
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ .Release.Name }}-rules
namespace: {{ .Release.Namespace }}
labels:
{{ set . "component" "prometheus-rules" | include "labels" | nindent 4 }}
prometheus: <your_prometheus_name>
spec:
groups:
- name: airbyte
rules:
- alert: AirbyteJobFail
for: 0m
expr: min(airbyte_job_failed_by_release_stage) > 0
labels:
priority: P2
annotations:
summary: {{ `"An Airbyte Job has failed"` }}
```


# Datadog
TBD

Expand Down