Skip to content

[BUG] Yace 0.62.1: panic: runtime error: index out of range [0] with length 0 #1702

@m-barthelemy

Description

@m-barthelemy

Is there an existing issue for this?

  • I have searched the existing issues

YACE version

0.62.1 image from https://hub.docker.com/r/prometheuscommunity/yet-another-cloudwatch-exporter/tags

Config file

Note: this config was last changed about a year ago, and was working fine with Yace 0.61.2

   apiVersion: v1alpha1
    discovery:
      exportedTagsOnMetrics:
        AWS/ApplicationELB:
          - Name
          - Environment
          - Family
        AWS/ElastiCache:
          - Name
          - Environment
          - Family
        AWS/NetworkELB:
          - Name
          - Environment
          - Family
        AWS/RDS:
          - Name
          - Environment
          - Family
        AWS/Redshift:
          - Name
          - Environment
          - Family
        AWS/Kafka:
          - Name
          - Environment
          - Family

      jobs:
        - type: AWS/RDS
          regions: [${AWS_REGION}]
          searchTags:
            - key: Environment
              value: ^(${ENVIRONMENT})$
            - key: Family
              value: ^(${FAMILY_NAME})$
          statistics: [Average]
          period: 60
          length: 120
          delay: 300
          metrics:
            - name: BurstBalance
            - name: CPUCreditBalance
            - name: CPUUtilization
            - name: DatabaseConnections
            - name: DiskQueueDepth
            # Different length of query for EBS Byte Balance %
            # For some reason, the default length returns 0
            # sometimes.
            - name: EBSByteBalance%
              period: 60
              length: 300
            - name: FreeableMemory
            - name: FreeStorageSpace
            - name: MaximumUsedTransactionIDs
            - name: NetworkReceiveThroughput
            - name: NetworkTransmitThroughput
            - name: ReadIOPS
            - name: ReadLatency
            - name: ReadThroughput
            - name: SwapUsage
            - name: WriteIOPS
            - name: WriteLatency
            - name: WriteThroughput

        - type: AWS/Redshift
          regions: [${AWS_REGION}]
          searchTags:
            - key: Environment
              value: ^(${ENVIRONMENT})$
            - key: Family
              value: ^(${FAMILY_NAME})$
          statistics: [Average]
          period: 60
          length: 120
          delay: 300
          metrics:
            - name: ReadIOPS
            - name: WriteIOPS
            # The average number of bytes read from disk per second.
            - name: ReadThroughput
            # The average number of bytes written to disk per second.
            - name: WriteThroughput
            - name: ReadLatency
            - name: WriteLatency
            - name: NetworkReceiveThroughput
            - name: NetworkTransmitThroughput
            - name: RedshiftManagedStorageTotalCapacity
            - name: TotalTableCount
            - name: DatabaseConnections
            - name: HealthStatus
            # The percent of disk space used.
            - name: PercentageDiskSpaceUsed
            # The disk or storage space used by a schema.
            - name: StorageUsed
            - name: AutoVacuumSpaceFreed
            - name: CPUUtilization
            - name: CommitQueueLength
            - name: MaintenanceMode
            # The average number of queries completed per second.
            - name: QueriesCompletedPerSecond
            # The average amount of time to complete a query. 
            - name: QueryDuration
            # The number of queries waiting to enter a workload management (WLM) queue.
            - name: WLMQueueLength
            # The total time queries spent waiting in the workload management (WLM) queue. 
            - name: WLMQueueWaitTime
            # The average number of queries completed per second for a workload management (WLM) queue.
            - name: WLMQueriesCompletedPerSecond
            # The average length of time to complete a query for a workload management (WLM) queue. 
            - name: WLMQueryDuration
            # The number of queries running from both the main cluster and concurrency scaling cluster per WLM queue.
            - name: WLMRunningQueries

        - type: AWS/ApplicationELB
          regions: [${AWS_REGION}]
          searchTags:
            - key: Environment
              # We have uat specific alb in staging account
              value: (${ENVIRONMENT}%{if ENVIRONMENT == "staging"}|uat%{endif})
            - key: Family
              value: ^(${FAMILY_NAME})$
          statistics: [Sum]
          period: 60
          length: 120
          delay: 300
          metrics:
            - name: TargetResponseTime
              nilToZero: true
              statistics: [Average]
            - name: RequestCount
              nilToZero: true
            - name: HTTPCode_Target_5XX_Count
              nilToZero: true
            - name: HTTPCode_Target_4XX_Count
              nilToZero: true
            - name: HTTPCode_Target_3XX_Count
              nilToZero: true
            - name: HTTPCode_Target_2XX_Count
              nilToZero: true
            - name: ActiveConnectionCount
              nilToZero: true
            - name: NewConnectionCount
              nilToZero: true
            - name: ProcessedBytes
              nilToZero: true

        - type: AWS/NetworkELB
          regions: [${AWS_REGION}]
          searchTags:
            - key: Environment
              # We have uat specific nlb in staging account
              value: (${ENVIRONMENT}%{if ENVIRONMENT == "staging"}|uat%{endif})
            - key: Family
              value: ^(${FAMILY_NAME})$
          statistics: [Sum]
          period: 60
          length: 120
          delay: 300
          metrics:
            - name: ActiveFlowCount
              nilToZero: true
            - name: ActiveFlowCount_TCP
              nilToZero: true
            - name: ActiveFlowCount_TLS
              nilToZero: true
            - name: ClientTLSNegotiationErrorCount
              nilToZero: true
            - name: NewFlowCount
              nilToZero: true
            - name: NewFlowCount_TCP
              nilToZero: true
            - name: NewFlowCount_TLS
              nilToZero: true
            - name: ProcessedBytes
              nilToZero: true
            - name: ProcessedBytes_TCP
              nilToZero: true
            - name: ProcessedBytes_TLS
              nilToZero: true
            - name: ProcessedPackets
              nilToZero: true
            - name: TCP_Client_Reset_count
              nilToZero: true
            - name: TCP_ELB_Reset_Count
              nilToZero: true
            - name: TCP_Target_Reset_Count
              nilToZero: true
        - type: AWS/Kafka
          regions: [${AWS_REGION}]
          searchTags:
            - key: Environment
              # We have uat specific nlb in staging account
              value: (${ENVIRONMENT}%{if ENVIRONMENT == "staging"}|uat%{endif})
            - key: Family
              value: ^(${FAMILY_NAME})$
          statistics: [Sum]
          period: 60
          length: 120
          delay: 300
          # Doc: https://docs.aws.amazon.com/msk/latest/developerguide/metrics-details.html
          metrics:
            # The percentage of the root disk used by the broker.
            - name: RootDiskUsed
              nilToZero: true
            # The percentage of disk space used for data logs.
            - name: KafkaDataLogsDiskUsed
              nilToZero: true
            # The number of active authenticated, unauthenticated, and inter-broker connections.
            - name: ConnectionCount
              nilToZero: true
            # This metric can help you monitor CPU credit balance on the brokers.
            - name: CPUCreditBalance
              nilToZero: true
            # Total number of topics across all brokers in the cluster.
            - name: GlobalTopicCount
              nilToZero: true
            # The total number of topic partitions per broker, including replicas.
            - name: PartitionCount
              nilToZero: true
            # The number of under-replicated partitions for the broker.
            - name: UnderReplicatedPartitions
              nilToZero: true
            # Total number of partitions that are offline in the cluster.
            - name: OfflinePartitionsCount
              nilToZero: true
            # The number of incoming messages per second for the broker.
            - name: MessagesInPerSec
              nilToZero: true
            # The average time in milliseconds spent in broker network and I/O threads to process requests.
            - name: RequestTime
              nilToZero: true
            # For Producers
            # The mean produce time in milliseconds.
            - name: ProduceTotalTimeMsMean
              nilToZero: true
            # For consumers
            # The aggregated offset lag for all the partitions in a topic.
            - name: SumOffsetLag
              nilToZero: true
            # For brokers
            # https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html#bestpractices-monitor-memory
            - name: HeapMemoryAfterGC
              nilToZero: true
            # https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/#metric-to-watch-page-cache-read-ratio
            - name: MemoryCached
              nilToZero: true
            # The size in bytes of memory that is free and available for the broker, if MemoryCached is high and MemoryFree is low, then the broker is using memory effectively.
            - name: MemoryFree
              nilToZero: true
            # The size in bytes of memory that is in use for the broker.
            - name: MemoryUsed
              nilToZero: true
            # The size in bytes of swap memory that is in use for the broker.
            - name: SwapUsed
              nilToZero: true
            # The In-Sync Replication (ISR) count indicates the set of replicas up-to-date with the leader. The expected value for UnderMinIsrPartitionCount is zero.
            - name: UnderMinIsrPartitionCount
              nilToZero: true
            # The percentage of CPU in user space used by the broker
            - name: CpuUser
              nilToZero: true
            # The number of bytes per second received from clients. This metric is available per broker and also per topic.
            - name: BytesInPerSec
              nilToZero: true
            # The number of bytes per second sent to clients. This metric is available per broker and also per topic.
            - name: BytesOutPerSec
              nilToZero: true
            # indicates the number of packets shaped (dropped or queued) due to exceeding network allocations. 
            - name: TrafficShaping
              nilToZero: true
            ## Advanced (paid) metrics
            # The number of messages in the throttle queue.
            - name: ProduceThrottleQueueSize
              nilToZero: true
            - name: RequestThrottleQueueSize
              nilToZero: true
            # The number of read and write operations in a specified time period
            - name: VolumeReadOps
              nilToZero: true
            - name: VolumeWriteOps
              nilToZero: true

        - type: AWS/ElastiCache
          regions: [${AWS_REGION}]
          searchTags:
            - key: Environment
              # We have uat specific elasticache in staging account
              value: (${ENVIRONMENT}%{if ENVIRONMENT == "staging"}|uat%{endif})
            - key: Family
              value: ^(${FAMILY_NAME})$
          statistics: [Average]
          period: 60
          length: 120
          delay: 300
          metrics:
            - name: CacheHitRate
              nilToZero: true
            - name: CacheHits
            - name: CacheMisses
            - name: CPUCreditBalance
            - name: CPUUtilization
            - name: CurrConnections
            # Total number of keys in all databases
            - name: CurrItems
            # Total number of keys in all databases that have a ttl set
            - name: CurrVolatileItems
            # The number of keys that have been evicted due to the maxmemory limit
            - name: Evictions
            # Indicates whether the node is the primary node of current shard/cluster. 1 = primary, 0 = not primary
            - name: IsMaster
              # Sometimes, Cloudwatch doesn't return any value. We can't get Yace to turn this into a zero,
              # otherwise our alerting con mistakenly consider it as a change of master node (failover).
              nilToZero: false
            - name: DatabaseMemoryUsagePercentage
            - name: EngineCPUUtilization
            - name: Evictions
            - name: NetworkBytesIn
            - name: NetworkBytesOut
            # Number of packets queued or dropped because the outbound bandwidth exceeded the maximum for the instance
            - name: NetworkBandwidthOutAllowanceExceeded
            # Number of packets dropped because connection tracking exceeded the maximum for the instance
            - name: NetworkConntrackAllowanceExceeded
            # Number of packets queued or dropped because the bidirectional packets/s exceeded the maximum for the instance
            - name: NetworkPacketsPerSecondAllowanceExceeded
            # The total number of connections that have been accepted by the server during this period
            - name: NewConnections
            # How far behind the replica is in applying changes from the primary node
            - name: ReplicationLag
            # Redis commands metrics
            - name: GetTypeCmds
              nilToZero: true
            - name: SetTypeCmds
              nilToZero: true
            - name: HashBasedCmds
              nilToZero: true
            - name: KeyBasedCmds
              nilToZero: true
            - name: NonKeyTypeCmds
              nilToZero: true
            - name: SetBasedCmds
              nilToZero: true
            - name: SortedSetBasedCmds
              nilToZero: true
            - name: StringBasedCmds
              nilToZero: true
            - name: JsonBasedGetCmds
              nilToZero: true
            - name: JsonBasedSetCmds
              nilToZero: true
            - name: ListBasedCmds
              nilToZero: true
            - name: PubSubBasedCmds
              nilToZero: true
            - name: EvalBasedCmds
              nilToZero: true
            # Latency metrics per command type
            - name: GetTypeCmdsLatency
              nilToZero: true
            - name: SetTypeCmdsLatency
              nilToZero: true
            - name: HashBasedCmdsLatency
              nilToZero: true
            - name: KeyBasedCmdsLatency
              nilToZero: true
            - name: NonKeyTypeCmdsLatency
              nilToZero: true
            - name: SetBasedCmdsLatency
              nilToZero: true
            - name: SortedSetBasedCmdsLatency
              nilToZero: true
            - name: StringBasedCmdsLatency
              nilToZero: true
            - name: JsonBasedCmdsLatency
              nilToZero: true
            - name: JsonBasedGetCmdsLatency
              nilToZero: true
            - name: JsonBasedSetCmdsLatency
              nilToZero: true
            - name: ListBasedCmdsLatency
              nilToZero: true
            - name: PubSubBasedCmdsLatency
              nilToZero: true
            - name: EvalBasedCmdsLatency
              nilToZero: true

    static:
    %{~ if length(EB_INFO) > 0 ~}
    %{~ for eb in EB_INFO ~}
      - namespace: AWS/Events
        name: aws_eventbridge
        regions:
          - ${AWS_REGION}
        dimensions:
          - name: EventBusName
            value: ${eb.event_bus_name}
          - name: RuleName
            value: ${eb.rule_name}
        customTags:
          - key: Environment
            value: ${ENVIRONMENT}
          - key: Family
            value: ${FAMILY_NAME}
        metrics:
          - name: DeadLetterInvocations
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: Events
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: FailedInvocations
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: IngestionToInvocationStartLatency
            statistics: [p50, p90, p99]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: Invocations
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: InvocationsFailedToBeSentToDlq
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: InvocationsSentToDlq
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: MatchedEvents
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: ThrottledRules
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
          - name: TriggeredRules
            statistics: [Sum]
            period: 60
            length: 120
            delay: 300
            nilToZero: true
    %{~ endfor ~}
    %{~ endif ~}
    %{~ if LOG_GROUP_NAME != "" ~}
      - namespace: AWS/Logs
        name: aws_cloudwatch_no_logs
        regions:
          - ${AWS_REGION}
        dimensions:
          - name: LogGroupName
            value: ${LOG_GROUP_NAME}
        customTags:
          - key: Environment
            value: ${ENVIRONMENT}
          - key: Family
            value: ${FAMILY_NAME}
        metrics:
          - name: IncomingLogEvents
            statistics:
            - Sum
            period: 60
            length: 120           
    %{~ endif ~}

Current Behavior

After trying to update to 0.62.1 from 0.61.2, Yace crashes immediately after starting with the following error:

{"time":"2025-06-19T02:27:48.971833727Z","level":"INFO","source":"main.go:344","msg":"Yace startup completed","version":"custom-build","version":"custom-build","feature_flags":""}
{"time":"2025-06-19T02:27:51.47698747Z","level":"ERROR","source":"discovery.go:52","msg":"No tagged resources made it through filtering","version":"custom-build","job_type":"AWS/Redshift","region":"ap-southeast-2","arn":"","account":"180570210447","err":"expected to discover resources but none were found"}
panic: runtime error: index out of range [0] with length 0

goroutine 436 [running]:
github.com/prometheus-community/yet-another-cloudwatch-exporter/pkg/clients/cloudwatch/v1.createGetMetricStatisticsInput({0xc000b6da40, 0x2, 0x2}, 0xc000781f70, 0xc00063e960, 0xc0008063d0)
        /app/pkg/clients/cloudwatch/v1/input.go:74 +0x997
github.com/prometheus-community/yet-another-cloudwatch-exporter/pkg/clients/cloudwatch/v1.client.GetMetricStatistics({0xc0004c7800?, {0x42e37f0?, 0xc00035e068?}}, {0x42c1648, 0xc0007066c0}, 0xc0008063d0, {0xc000b6da40, 0x2, 0x2}, {0xc0004fee80, ...}, ...)
        /app/pkg/clients/cloudwatch/v1/client.go:157 +0xf1
github.com/prometheus-community/yet-another-cloudwatch-exporter/pkg/clients/cloudwatch.limitedConcurrencyClient.GetMetricStatistics({{0x42c09b0?, 0xc000010a38?}, {0x42880e0?, 0xc00007a600?}}, {0x42c1648, 0xc0007066c0}, 0xc0008063d0, {0xc000b6da40, 0x2, 0x2}, ...)
        /app/pkg/clients/cloudwatch/client.go:78 +0xea
github.com/prometheus-community/yet-another-cloudwatch-exporter/pkg/job.runStaticJob.func1()
        /app/pkg/job/static.go:56 +0x39e
created by github.com/prometheus-community/yet-another-cloudwatch-exporter/pkg/job.runStaticJob in goroutine 55
        /app/pkg/job/static.go:37 +0x125

Expected Behavior

Noticing the error message about Redshift discovery just before the crash, indeed there is no Redshift cluster to discover in the current AWS account where we were running our tests with the new version of Yace.
However, if that's what is causing the panic, Yace should preferably skip it and move on (like it seemingly used to do in the previous releases).

Steps To Reproduce

No response

Anything else?

YACE is a great tool and we love it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions