Skip to content

Commit 5a34ea4

Browse files
Merge pull request #2557 from suhanime/metrics_faq
Document metrics FAQ
2 parents 2da23f1 + 19e6e31 commit 5a34ea4

File tree

1 file changed

+27
-14
lines changed

1 file changed

+27
-14
lines changed

docs/hive_metrics.md

Lines changed: 27 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,22 @@
22
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
33
**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*
44

5-
- [Optional Metrics](#optional-metrics)
6-
- [Duration-based Metrics](#duration-based-metrics)
7-
- [Metrics with Optional Cluster Deployment labels](#metrics-with-optional-cluster-deployment-labels)
8-
- [List of all Hive metrics](#list-of-all-hive-metrics)
9-
- [Hive Operator metrics](#hive-operator-metrics)
10-
- [Metrics reported by all controllers](#metrics-reported-by-all-controllers)
11-
- [ClusterDeployment controller metrics](#clusterdeployment-controller-metrics)
12-
- [ClusterProvision controller metrics](#clusterprovision-controller-metrics)
13-
- [ClusterDeprovision controller metrics](#clusterdeprovision-controller-metrics)
14-
- [ClusterPool controller metrics](#clusterpool-controller-metrics)
15-
- [Metrics controller metrics](#metrics-controller-metrics)
16-
- [Managed DNS Metrics](#managed-dns-metrics)
17-
- [Example: Configure metricsConfig](#example-configure-metricsconfig)
5+
- [Hive Metrics](#hive-metrics)
6+
- [Optional Metrics](#optional-metrics)
7+
- [Duration-based Metrics](#duration-based-metrics)
8+
- [Metrics with Optional Cluster Deployment labels](#metrics-with-optional-cluster-deployment-labels)
9+
- [List of all Hive metrics](#list-of-all-hive-metrics)
10+
- [Hive Operator metrics](#hive-operator-metrics)
11+
- [Metrics reported by all controllers](#metrics-reported-by-all-controllers)
12+
- [ClusterDeployment controller metrics](#clusterdeployment-controller-metrics)
13+
- [ClusterProvision controller metrics](#clusterprovision-controller-metrics)
14+
- [ClusterDeprovision controller metrics](#clusterdeprovision-controller-metrics)
15+
- [ClusterPool controller metrics](#clusterpool-controller-metrics)
16+
- [Metrics controller metrics](#metrics-controller-metrics)
17+
- [Managed DNS Metrics](#managed-dns-metrics)
18+
- [Example: Configure metricsConfig](#example-configure-metricsconfig)
19+
- [Frequently Asked Questions](#frequently-asked-questions)
20+
- [How can I leverage hive metrics to point to the offending cluster?](#how-can-i-leverage-hive-metrics-to-point-to-the-offending-cluster)
1821

1922
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
2023

@@ -180,4 +183,14 @@ spec:
180183
| hive_cluster_deployments_waiting_for_cluster_operators_seconds | currentWaitingForCO |
181184
| hive_clustersync_failing_seconds | currentClusterSyncFailing |
182185
| hive_cluster_deployments_hibernation_transition_seconds | cumulativeHibernated |
183-
| hive_cluster_deployments_running_transition_seconds | cumulativeResumed |
186+
| hive_cluster_deployments_running_transition_seconds | cumulativeResumed |
187+
188+
### Frequently Asked Questions
189+
190+
#### How can I leverage hive metrics to point to the offending cluster?
191+
Metrics are meant for monitoring and observing. While you can leverage Alertmanager to throw alerts, it is really not recommended to rely on prometheus metrics to provide the cluster ID.
192+
193+
For this to work, the hive metric would need to publish a label with the cluster identifying information. Since every metric is stored as a map with dimensions equivalent to the labels per its definition,
194+
the storage taken up by the metric exponentially increases with the number of labels. Hive can manage hundreds, if not thousands of clusters at a time, so high-cardinality labels like cluster ID/name/namespace
195+
are capable of bringing down the hive instance in certain situations.
196+
While we publish the cluster identifying label with certain metrics that are reported infrequently enough to not cause an issue, Hive does not recommend relying on hive metrics for tracing the clusters.

0 commit comments

Comments
 (0)