From 38d0dce0f93304b533ac34c1180dde3797370c6c Mon Sep 17 00:00:00 2001 From: Joe Peeples Date: Tue, 25 Mar 2025 14:29:07 -0400 Subject: [PATCH 1/3] Add new CI Health page, first draft --- config/_default/menus/main.en.yaml | 11 ++- .../continuous_integration/health/_index.md | 79 +++++++++++++++++++ 2 files changed, 87 insertions(+), 3 deletions(-) create mode 100644 content/en/continuous_integration/health/_index.md diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml index 8aa21bced5564..526f5f63bb55b 100644 --- a/config/_default/menus/main.en.yaml +++ b/config/_default/menus/main.en.yaml @@ -4448,21 +4448,26 @@ menu: parent: ci_explorer identifier: ci_explorer_saved_views weight: 304 + - name: CI Health + url: continuous_integration/health/ + parent: ci + identifier: ci_health + weight: 4 - name: Monitors url: monitors/types/ci/?tab=pipelines parent: ci identifier: ci_monitors - weight: 4 + weight: 5 - name: Guides url: continuous_integration/guides/ parent: ci identifier: ci_guides - weight: 5 + weight: 6 - name: Troubleshooting url: continuous_integration/troubleshooting/ parent: ci identifier: ci_troubleshooting - weight: 6 + weight: 7 - name: CD Visibility url: continuous_delivery/ pre: ci diff --git a/content/en/continuous_integration/health/_index.md b/content/en/continuous_integration/health/_index.md new file mode 100644 index 0000000000000..a6fcb067c6ab4 --- /dev/null +++ b/content/en/continuous_integration/health/_index.md @@ -0,0 +1,79 @@ +--- +title: CI Health +description: "Monitor and analyze the health of your CI pipelines" +further_reading: +- link: "/continuous_integration/pipelines/" + tag: "Documentation" + text: "CI Pipeline Visibility" +- link: "/continuous_integration/explorer/" + tag: "Documentation" + text: "Search and filter pipeline executions" +--- + +[CI Health][1] provides centralized visibility into your CI pipelines so you can improve CI processes based on precise objectives. The page organizes pipeline metrics, test results, and execution data to help you identify the most impactful pipelines and jobs within the following main objectives: + + - [Save Developer Time](#save-developer-time) + - [Reduce CI Cost](#reduce-ci-cost) + - [Speed Up Pipelines](#speed-up-pipelines) + +No additional setup is needed—the CI Health page is available out-of-the-box when you enable [CI Visibility][2]. + +
image placeholder

+ +## Save Developer Time + +This objective helps you minimize the time developers spend running pipelines multiple times to make them pass. + +By default, pipelines are sorted by their retry percentage, so you can focus on pipelines with the most retries. Aggregated data shows the portion of commits ending in failure and how many are flaky (both failing and passing across multiple runs). + +To reduce flakiness, click on a pipeline and sort the results in the side panel to find the flakiest jobs. You can also use the breakdown column to identify the [types of failures][3], which can give you insight into possible root causes. Click **View Pipeline Executions** to further investigate CI runs in [Pipeline Executions][4]. + +
image placeholder

+ +## Reduce CI Cost + +This objective helps you reduce the cost of your CI environment by identifying wasted compute time from pipeline retries. + +This view highlights the **wasted active jobs time**, which is the sum of durations of jobs that needed retries. In SaaS CI environments, this indicates potential savings in billable time you wouldn't have to pay for if there were no flaky pipelines. + +This measure is also significant for on-premises CI runners, though it may overestimate the impact of pipelines with highly parallelized jobs. For self-hosted CI environments, **wasted runners time** represents the infrastructure cost that you could save without flaky pipelines. + +By default, pipelines are sorted by **wasted active jobs time**, so you can focus on pipelines contributing to your CI cost with the most retries. + +
Change the Aggregation method from Avg to Sum to find the biggest contributors to cost due to wasted CI time.
+ +To reduce flakiness, click on a pipeline and sort the results in the side panel to find the flakiest jobs. You can also use the breakdown column to identify the [types of failures][3], which can give you insight into possible root causes. Click **View Pipeline Executions** to further investigate CI runs in [Pipeline Executions][4]. + +
image placeholder

+ +## Speed Up Pipelines + +This objective helps you reduce the time needed to get a passing pipeline per commit. + +This view highlights the **time to pass**, which is the time from the first execution to the first passing pipeline. It represents how much time is needed for a commit to get a successful pipeline. This can impact the [DORA metrics][5] _lead time for changes_ and _time to restore service_. + +
image placeholder

+ +The breakdown column identifies several types of time measurements and how they contribute to the total time to pass: + +- **Execution time**: The running time of the pipeline. To reduce this, focus on optimizing jobs in the [critical path][6]. Click on a pipeline and sort by the **Time on critical path** column in the side panel. + +
image placeholder

+ +- **Idle time**: The time between pipeline executions. To reduce this: + - Minimize flakiness in pipelines and jobs. + - Apply [Auto Test Retries][7]. + - Optimize notifications so developers know their pipeline failed and may need a retry. + +- **Pipeline creation time**: The time spent creating the pipeline. +- **Queue time**: The time jobs spend waiting for a runner. To reduce this, spin up more runners. +- **Wait/Approval time**: The time for manual approvals. +- **Other time**: Uncategorized time contributing to total time to pass. + +[1]: https://app.datadoghq.com/ci/pipelines/health +[2]: /continuous_integration/ +[3]: /continuous_integration/search/#ai-generated-log-summaries +[4]: /continuous_integration/explorer +[5]: /dora_metrics +[6]: /continuous_integration/guides/identify_highest_impact_jobs_with_critical_path/ +[7]: /tests/flaky_test_management/auto_test_retries/ From b5d1ebdfaf5fb2fc7306b10247ceb31796f7a006 Mon Sep 17 00:00:00 2001 From: Joe Peeples Date: Tue, 25 Mar 2025 14:49:05 -0400 Subject: [PATCH 2/3] Style edits --- content/en/continuous_integration/health/_index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/content/en/continuous_integration/health/_index.md b/content/en/continuous_integration/health/_index.md index a6fcb067c6ab4..6d127db2e0ee5 100644 --- a/content/en/continuous_integration/health/_index.md +++ b/content/en/continuous_integration/health/_index.md @@ -36,7 +36,7 @@ This objective helps you reduce the cost of your CI environment by identifying w This view highlights the **wasted active jobs time**, which is the sum of durations of jobs that needed retries. In SaaS CI environments, this indicates potential savings in billable time you wouldn't have to pay for if there were no flaky pipelines. -This measure is also significant for on-premises CI runners, though it may overestimate the impact of pipelines with highly parallelized jobs. For self-hosted CI environments, **wasted runners time** represents the infrastructure cost that you could save without flaky pipelines. +This measure is also significant for on-premises CI runners, though it may overestimate the impact of pipelines with parallelized jobs. For self-hosted CI environments, **wasted runners time** represents the infrastructure cost that you could save without flaky pipelines. By default, pipelines are sorted by **wasted active jobs time**, so you can focus on pipelines contributing to your CI cost with the most retries. @@ -65,7 +65,7 @@ The breakdown column identifies several types of time measurements and how they - Apply [Auto Test Retries][7]. - Optimize notifications so developers know their pipeline failed and may need a retry. -- **Pipeline creation time**: The time spent creating the pipeline. +- **Pipeline creation time**: The time spent creating the pipeline. - **Queue time**: The time jobs spend waiting for a runner. To reduce this, spin up more runners. - **Wait/Approval time**: The time for manual approvals. - **Other time**: Uncategorized time contributing to total time to pass. From be6f2c2f9367345333f32c9df493892233d26788 Mon Sep 17 00:00:00 2001 From: Joe Peeples Date: Tue, 25 Mar 2025 14:54:17 -0400 Subject: [PATCH 3/3] Rename setup page, update name --- config/_default/menus/main.en.yaml | 2 +- content/en/continuous_integration/pipelines/_index.md | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml index e05f807636212..9023bfeca86be 100644 --- a/config/_default/menus/main.en.yaml +++ b/config/_default/menus/main.en.yaml @@ -4367,7 +4367,7 @@ menu: identifier: ci parent: software_delivery_heading weight: 10000 - - name: Pipeline Visibility + - name: Setup url: continuous_integration/pipelines/ parent: ci identifier: pipeline_visibility diff --git a/content/en/continuous_integration/pipelines/_index.md b/content/en/continuous_integration/pipelines/_index.md index 10799bf7d38b3..8e4e8a16dee73 100644 --- a/content/en/continuous_integration/pipelines/_index.md +++ b/content/en/continuous_integration/pipelines/_index.md @@ -1,5 +1,5 @@ --- -title: CI Pipeline Visibility in Datadog +title: Set up CI Visibility aliases: - /continuous_integration/pipelines_setup/ - /continuous_integration/explore_pipelines/ @@ -26,11 +26,11 @@ cascade: ## Overview -[Pipeline Visibility][1] provides a pipeline-first view into your CI health by displaying important metrics and results from your pipelines. It helps you troubleshoot pipeline failures, address performance bottlenecks, and track CI performance and reliability over time. +[CI Visibility][1] provides a pipeline-first view into your CI health by displaying important metrics and results from your pipelines. It helps you troubleshoot pipeline failures, address performance bottlenecks, and track CI performance and reliability over time. ## Setup -{{< whatsnext desc="Select your CI provider to set up Pipeline Visibility in Datadog:" >}} +{{< whatsnext desc="Select your CI provider to set up CI Visibility in Datadog:" >}} {{< nextlink href="continuous_integration/pipelines/awscodepipeline" >}}AWS CodePipeline{{< /nextlink >}} {{< nextlink href="continuous_integration/pipelines/azure" >}}Azure{{< /nextlink >}} {{< nextlink href="continuous_integration/pipelines/buildkite" >}}Buildkite{{< /nextlink >}} @@ -47,7 +47,7 @@ cascade: ### Terminology -While the concept of a CI pipeline may vary depending on your provider, see how those concepts correspond to the definition of a CI pipeline in Datadog Pipeline Visibility: +While the concept of a CI pipeline may vary depending on your provider, see how those concepts correspond to the definition of a CI pipeline in Datadog CI Visibility: {{< tabs >}} {{% tab "GitHub Actions" %}} @@ -136,7 +136,7 @@ While the concept of a CI pipeline may vary depending on your provider, see how {{% /tab %}} {{< /tabs >}} -If your CI provider is not supported, you can try setting up Pipeline Visibility through the [public API endpoint][2]. +If your CI provider is not supported, you can try setting up CI Visibility through the [public API endpoint][2]. ### Supported features