Skip to content

[BUG] YACE does not correctly retrieve sparse metrics like KMS CreateKey CallCount #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
harshpadhye opened this issue Apr 29, 2025 · 0 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@harshpadhye
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

YACE version

v0.61.2

Config file

No response

Current Behavior

We are experiencing an issue where YACE fails to accurately retrieve the CallCount metric for the KMS CreateKey operation. While the metric values are visible in CloudWatch, YACE consistently returns 0 even when the CallCount is non-zero within a given period. The creation of the KMS key is infrequent and sporadic.

Expected Behavior

When querying the CallCount metric over a specified length, we expect YACE to return a positive, non-zero aggregate value (for Maximum, Average, and Sum) if a KMS key was indeed during that length of time.

Steps To Reproduce

  1. Run YACE with a scrape interval of 300, length of 300, and period <= length
  2. Monitor live view of the Max, Average, and Sum values of KMS CreateKey CallCount metric
  3. Create KMS Key in AWS console
  4. Confirm metric exists and has value > 0 in CloudWatch
  5. Monitor metric and notice it remains 0 across several scrape intervals

Anything else?

Root Cause (Suspected):

Based on the code in pkg/clients/cloudwatch/v2/client.go here:

func toMetricDataResult(resp cloudwatch.GetMetricDataOutput) []cloudwatch_client.MetricDataResult {
output := make([]cloudwatch_client.MetricDataResult, 0, len(resp.MetricDataResults))
for _, metricDataResult := range resp.MetricDataResults {
mappedResult := cloudwatch_client.MetricDataResult{ID: *metricDataResult.Id}
if len(metricDataResult.Values) > 0 {
mappedResult.Datapoint = &metricDataResult.Values[0]
mappedResult.Timestamp = metricDataResult.Timestamps[0]
}
output = append(output, mappedResult)
}
return output
}
and the associated test example: https://github.com/prometheus-community/yet-another-cloudwatch-exporter/blob/master/pkg/clients/cloudwatch/v2/client_test.go#L28-L80 it appears that YACE might be retrieving only the first data point returned by CloudWatch for a given metric query. For sparse metrics like CallCount, if the first data point within the requested period is 0 (because no event occurred precisely at that timestamp), YACE will report 0, even if subsequent data points within the same period are non-zero.

This has been noticed by other users as well: #865 (comment)

@harshpadhye harshpadhye added the bug Something isn't working label Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant