Skip to content

K8s leader elector extension starts before the receivers have the possibility to register themselfs #40346

Open
@odubajDT

Description

@odubajDT

Component(s)

extension/k8sleaderelector

What happened?

Description

The k8s leader elector extension starts and acquires the lease before the receivers depending on the leader elector have a chance to register themselves via the SetCallBackFunc()

Steps to Reproduce

The following config:

extensions:
  health_check:
    endpoint: 0.0.0.0:13133
  k8s_leader_elector/metrics:
    auth_type: "serviceAccount"
    lease_name: opentelemetry-collector-agent-metrics
    lease_namespace: '${env:POD_NAMESPACE}'
  k8s_leader_elector/logs:
    auth_type: "serviceAccount"
    lease_name: opentelemetry-collector-agent-logs
    lease_namespace: '${env:POD_NAMESPACE}'

receivers:
  k8sobjects:
    error_mode: ignore
    k8s_leader_elector: k8s_leader_elector/logs
    objects:
      - name: events
        mode: watch
  k8s_cluster:
    k8s_leader_elector: k8s_leader_elector/metrics
    auth_type: "serviceAccount"
    collection_interval: 10s
    allocatable_types_to_report:
      - cpu
      - memory
      - pods
    node_conditions_to_report:
      - Ready
      - MemoryPressure
      - PIDPressure
      - DiskPressure
      - NetworkUnavailable
    metrics:
      k8s.node.condition:
        enabled: true
      k8s.pod.status_reason:
        enabled: true
  kubeletstats:
    auth_type: "serviceAccount"
    collection_interval: 10s
    node: '${env:K8S_NODE_NAME}'
    extra_metadata_labels:
      - k8s.volume.type
    k8s_api_config:
      auth_type: "serviceAccount"
    endpoint: "https://${env:K8S_NODE_NAME}:10250"
    insecure_skip_verify: true
    metric_groups:
      - node
      - pod
      - container
      - volume

exporters:
  debug:
    verbosity: detailed
  otlphttp:
    endpoint: ${env:DT_ENDPOINT}
    headers:
      Authorization: "Api-Token ${env:API_TOKEN}"

service:
  extensions:
    - k8s_leader_elector/logs
    - k8s_leader_elector/metrics
    - health_check
  pipelines:
    metrics:
      receivers:
        - kubeletstats
        - k8s_cluster
      exporters:
        - otlphttp
    logs:
      receivers: 
        - k8sobjects
      exporters: 
        - otlphttp

results in a non-deterministic behavior. Sometimes, one of the k8scluster/k8sobjects receivers manage to register, but ususally only data from kubeletstats receiver are available and the other receivers do not even start

Deployed as DaemonSet in k8s

Expected Result

All receivers working correctly

Actual Result

Logs from the collector:

2025-05-28T06:14:46.424Z	info	builders/builders.go:26	Development component. May change in the future.	{"resource": {}, "otelcol.component.id": "debug", "otelcol.component.kind": "exporter", "otelcol.signal": "logs"}
2025-05-28T06:14:46.486Z	info	builders/extension.go:50	Development component. May change in the future.	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/logs", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.487Z	info	builders/extension.go:50	Development component. May change in the future.	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/metrics", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.503Z	info	[email protected]/service.go:266	Starting dynatrace-otel-collector...	{"resource": {}, "Version": "0.30.1", "NumCPU": 6}
2025-05-28T06:14:46.504Z	info	extensions/extensions.go:41	Starting extensions...	{"resource": {}}
2025-05-28T06:14:46.504Z	info	extensions/extensions.go:45	Extension is starting...	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/metrics", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.504Z	info	[email protected]/extension.go:63	Starting k8s leader elector with UUID	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/metrics", "otelcol.component.kind": "extension", "UUID": "4992dc4f-bc16-42f8-a7b9-4e3211c6f854"}
2025-05-28T06:14:46.505Z	info	extensions/extensions.go:62	Extension started.	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/metrics", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.505Z	info	extensions/extensions.go:45	Extension is starting...	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/logs", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.505Z	info	[email protected]/extension.go:63	Starting k8s leader elector with UUID	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/logs", "otelcol.component.kind": "extension", "UUID": "5ba4a8a7-55c8-4b03-83a9-4a8881afc151"}
2025-05-28T06:14:46.505Z	info	extensions/extensions.go:62	Extension started.	{"resource": {}, "otelcol.component.id": "k8s_leader_elector/logs", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.505Z	info	extensions/extensions.go:45	Extension is starting...	{"resource": {}, "otelcol.component.id": "health_check", "otelcol.component.kind": "extension"}
I0528 06:14:46.505855       1 leaderelection.go:257] attempting to acquire leader lease e2ek8scombined/opentelemetry-collector-agent-logs...
I0528 06:14:46.506237       1 leaderelection.go:257] attempting to acquire leader lease e2ek8scombined/opentelemetry-collector-agent-metrics...
2025-05-28T06:14:46.506Z	info	[email protected]/healthcheckextension.go:32	Starting health_check extension	{"resource": {}, "otelcol.component.id": "health_check", "otelcol.component.kind": "extension", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Middlewares":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2025-05-28T06:14:46.522Z	info	extensions/extensions.go:62	Extension started.	{"resource": {}, "otelcol.component.id": "health_check", "otelcol.component.kind": "extension"}
2025-05-28T06:14:46.545Z	info	kube/client.go:132	k8s filtering	{"resource": {}, "otelcol.component.id": "k8sattributes", "otelcol.component.kind": "processor", "otelcol.pipeline.id": "metrics", "otelcol.signal": "metrics", "labelSelector": "", "fieldSelector": "spec.nodeName=kind-worker2"}
I0528 06:14:46.655101       1 leaderelection.go:271] successfully acquired lease e2ek8scombined/opentelemetry-collector-agent-metrics
I0528 06:14:46.655458       1 leaderelection.go:271] successfully acquired lease e2ek8scombined/opentelemetry-collector-agent-logs
2025-05-28T06:14:46.662Z	info	[email protected]/receiver.go:108	Starting k8sClusterReceiver with leader election	{"resource": {}, "otelcol.component.id": "k8s_cluster", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-05-28T06:14:46.662Z	info	[email protected]/receiver.go:130	registering the receiver in leader election	{"resource": {}, "otelcol.component.id": "k8sobjects", "otelcol.component.kind": "receiver", "otelcol.signal": "logs"}
2025-05-28T06:14:46.666Z	info	healthcheck/handler.go:132	Health Check state change	{"resource": {}, "otelcol.component.id": "health_check", "otelcol.component.kind": "extension", "status": "ready"}
2025-05-28T06:14:46.667Z	info	[email protected]/service.go:289	Everything is ready. Begin running and processing data.	{"resource": {}}
2025-05-28T06:14:46.726Z	info	[email protected]/receiver.go:64	Starting shared informers and wait for initial cache sync.	{"resource": {}, "otelcol.component.id": "k8s_cluster", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}
2025-05-28T06:14:46.828Z	info	[email protected]/receiver.go:85	Completed syncing shared informer caches.	{"resource": {}, "otelcol.component.id": "k8s_cluster", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics"}

Look at the Starting k8sClusterReceiver with leader election and registering the receiver in leader election logs which are visible after the lease is acquired. In a scenario where a receiver works correctly, we should see the logs before the lease is acquired.

When the Object Receiver started as leader log is present in the logs (not present in the above logs) -> it means that the function registered via SetCallBackFunc was executed and only then (for k8sobjects receiver, k8scluster receiver does not have such log message) the receiver starts properly and works as expected.

Collector version

0.127.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions