Description
Changing the --check-interval
from the default 1m to a larger value (e.g., 5m) causes incorrect values to be reported by the k8s_image_availability_exporter_available
metric during the first few minutes after the exporter starts.
Reproduction Example:
• I have a deployment using a non-existent image:
example.com/pandora/example:example-20240819-113636-b7581fe
• When using the default --check-interval=1m
, the exporter correctly reports k8s_image_availability_exporter_available
= 0 shortly after startup.
• However, when I configure --check-interval=5m
, the metric initially reports k8s_image_availability_exporter_available
= 1 for this missing image.
• After several minutes (presumably after the first check is executed), the metric correctly switches to 0.
Problem:
This initial value of 1 is misleading, especially for alerting and monitoring systems relying on early signals. It seems that the metric is initialized as 1 (available) before the first image check is actually performed.
Expected Behavior:
The metric should either:
• Not emit a value until the first image check is completed, or
• Default to 0 (unavailable) until proven otherwise by the first image availability check.
Graph Example:
In the example graph, every time k8s-image-availability-exporter restarts, the k8s_image_availability_exporter_available
metric shows a value of 1 for several minutes.
Additional Context:
This behavior is especially problematic because rechecking image availability every minute (--check-interval=1m) is too aggressive and can put unnecessary pressure on the container registry. A more realistic interval is in the range of 15–30 minutes. However, with longer intervals, this startup behavior becomes more impactful and can result in false positives for several minutes after deployment.
Suggested Fix:
Ensure that metrics are not reported until the initial check is performed, or explicitly initialize them as 0 if the check hasn’t yet been executed.