Skip to content

Alerting plugin - experimental cross cluster monitor support documentation #6350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17 changes: 10 additions & 7 deletions _observing-your-data/alerting/per-cluster-metrics-monitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ has_children: false

# Per cluster metrics monitors

Per cluster metrics monitors are a type of alert monitor that collects and analyzes metrics from a single cluster, providing insights into the cluster's performance and health. You can set alerts to monitor certain conditions, such as when:
_Per cluster metrics monitors_ are a type of alert monitor that collects and analyzes metrics from a single cluster, providing insights into the cluster's performance and health. You can set alerts to monitor certain conditions, such as when:

- Cluster health reaches yellow or red status.
- Cluster-level metrics---for example, CPU usage and JVM memory usage---reach specified thresholds.
Expand Down Expand Up @@ -51,7 +51,7 @@ Trigger conditions use responses from the following API endpoints. Most APIs tha

If you want to hide fields from the API response and not expose them for alerting, reconfigure the [supported_json_payloads.json](https://github.com/opensearch-project/alerting/blob/main/alerting/src/main/resources/org/opensearch/alerting/settings/supported_json_payloads.json) file inside the Alerting plugin. The file functions as an allow list for the API fields you want to use in an alert. By default, all APIs and their parameters can be used for monitors and trigger conditions.

However, you can modify the file so that cluster metric monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.
However, you can modify the file so that cluster metrics monitors can only be created for APIs referenced. Furthermore, only fields referenced in the supported files can create trigger conditions. This `supported_json_payloads.json` allows for a cluster metrics monitor to be created for the `_cluster/stats` API, and triggers conditions for the `indices.shards.total` and `indices.shards.index.shards.min` fields.

```json
"/_cluster/stats": {
Expand All @@ -68,7 +68,9 @@ Painless scripts define triggers for cluster metrics monitors, similar to per qu

The cluster metrics monitor supports up to **ten** triggers.

In the following example, a JSON object creates a trigger that sends an alert when the cluster health is yellow. `script` points the `source` to the Painless script `ctx.results[0].status == \"yellow\`.
In the following example, the monitor is configured to call the Cluster Health API for two clusters, `cluster-1` and `cluster-2`. The trigger condition will create an alert when either of the clusters' `status` is not `green`.

The `script` parameter points the `source` to the Painless script `for (cluster in ctx.results[0].keySet()) if (ctx.results[0][cluster].status != \"green\") return true`. See [Trigger variables]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/#trigger-variables) for more `painless ctx` variable options.

```json
{
Expand All @@ -88,7 +90,8 @@ In the following example, a JSON object creates a trigger that sends an alert wh
"api_type": "CLUSTER_HEALTH",
"path": "_cluster/health/",
"path_params": "",
"url": "http://localhost:9200/_cluster/health/"
"url": "http://localhost:9200/_cluster/health/",
"cluster": ["cluster-1", "cluster-2"]
}
}
],
Expand All @@ -100,7 +103,7 @@ In the following example, a JSON object creates a trigger that sends an alert wh
"severity": "1",
"condition": {
"script": {
"source": "ctx.results[0].status == \"yellow\"",
"source": "for (cluster in ctx.results[0].keySet()) if (ctx.results[0][cluster].status != \"green\") return true",
"lang": "painless"
}
},
Expand All @@ -110,14 +113,14 @@ In the following example, a JSON object creates a trigger that sends an alert wh
]
}
```
The dashboards interface supports the selection of clusters to be monitored and the desired API. A view of the interface is shown in the following image.

See [Trigger variables]({{site.url}}{{site.baseurl}}/observing-your-data/alerting/triggers/#trigger-variables) for more `painless ctx` variable options.
<img src="{{site.url}}{{site.baseurl}}/images/alerting/cross-cluster-cluster-metrics-monitors.png" alt="Cluster metrics monitor" width="700"/>

### Limitations

Per cluster metrics monitors have the following limitations:

- You cannot create monitors for remote clusters.
- The OpenSearch cluster must be in a state where an index's conditions can be monitored and actions can be executed against the index.
- Removing resource permissions from a user will not prevent that user’s preexisting monitors for that resource from executing.
- Users with permissions to create monitors are not blocked from creating monitors for resources for which they do not have permissions; however, those monitors will not run.
4 changes: 4 additions & 0 deletions _observing-your-data/alerting/per-query-bucket-monitors.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ Per query monitors are a type of alert monitor that can be used to identify and

Per bucket monitors are a type of alert monitor that can be used to identify and alert on specific buckets of data that are created by a query against an OpenSearch index.

Both monitor types support querying remote indexes using the same `cluster-name:index-name` pattern used by [cross-cluster search](https://opensearch.org/docs/latest/security/access-control/cross-cluster-search/) or by using OpenSearch Dashboards 2.12 or later.

<img src="{{site.url}}{{site.baseurl}}/images/alerting/cross-cluster-per-query-per-bucket-monitors.png" alt="Cluster metrics monitor" width="700"/>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image should be introduced in the text that precedes it.


## Creating a per query or per bucket monitor

To create a per query monitor, follow these steps:
Expand Down
1 change: 1 addition & 0 deletions _observing-your-data/alerting/settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Setting | Default | Description
`plugins.alerting.alert_history_retention_period` | 60d | The amount of time to keep history indexes before automatically deleting them.
`plugins.alerting.destination.allow_list` | ["chime", "slack", "custom_webhook", "email", "test_action"] | The list of allowed destinations. If you don't want to allow users to a certain type of destination, you can remove it from this list, but we recommend leaving this setting as-is.
`plugins.alerting.filter_by_backend_roles` | "false" | Restricts access to monitors by backend role. See [Alerting security]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/security/).
`plugins.alerting.remote_monitoring_enabled` | "false" | Toggles whether cluster metrics monitors support executing against remote clusters.
`plugins.scheduled_jobs.sweeper.period` | 5m | The alerting feature uses its "job sweeper" component to periodically check for new or updated jobs. This setting is the rate at which the sweeper checks to see if any jobs (monitors) have changed and need to be rescheduled.
`plugins.scheduled_jobs.sweeper.page_size` | 100 | The page size for the sweeper. You shouldn't need to change this value.
`plugins.scheduled_jobs.sweeper.backoff_millis` | 50ms | The amount of time the sweeper waits between retries---increases exponentially after each failed retry.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.