Skip to content

Add include_individual_nodes, include_all_nodes, include_info params to stats API #1360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

q-andy
Copy link
Contributor

@q-andy q-andy commented May 29, 2025

Description

Adds 3 new parameter to stats API,

  • include_individual_nodes
  • include_all_nodes
  • include_info

All are default true. When set to false, the response excludes the corestpond category stats from the stats response

Some cases to note:

  • if include_individual_nodes=false and include_all_nodes=true, stats will still be fetched aggregated across nodes and the aggregate will be included in response. But individual nodes will not be excluded from the response.
  • If both include_individual_nodes=false and include_all_nodes=false, no stats will be fetched from any node
  • If include_info=false, then info stats will not be calculated.

This flag is backwards compatible since it is default true, which is the default previous expected behavior.

With include_individual_nodes flag false

GET /_plugins/_neural/stats?include_individual_nodes=false

{
	"_nodes": {
		"total": 1,
		"successful": 1,
		"failed": 0
	},
	"cluster_name": "integTest",
	"info": {
		"cluster_version": "3.1.0",
		"processors": {
			"ingest": {
				"text_chunking_delimiter_processors": 0,
				"text_chunking_fixed_length_processors": 0,
				"text_embedding_processors_in_pipelines": 0,
				"text_chunking_processors": 0
			}
		}
	},
	"all_nodes": {
		"semantic_highlighting": {
			"semantic_highlighting_request_count": 0
		},
		"processors": {
			"ingest": {
				"text_chunking_executions": 0,
				"text_embedding_executions": 0,
				"text_chunking_fixed_length_executions": 0,
				"text_chunking_delimiter_executions": 0
			}
		}
	}
}

Related Issues

Related to #1104

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

API spec issue: opensearch-project/opensearch-api-specification#890
Documentation issue: opensearch-project/documentation-website#9943

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@will-hwang
Copy link
Contributor

should we put a cap to how many node-level stats we want to support? what would be the performance impact if we had too many nodes in a cluster? Also, i think the default should be set to false

@q-andy
Copy link
Contributor Author

q-andy commented Jun 7, 2025

should we put a cap to how many node-level stats we want to support?

It's something we should be aware of, but it's difficult to say what the hard cap would be. Memory usage of the API call should increase linearly with the number of event stats we track so there isn't an easy hard threshold we can calculate.

what would be the performance impact if we had too many nodes in a cluster?

It's a bit ambiguous, but we're primarily concerned with a situation similar to _nodes/stats of _nodes/info. I chatted with infra team and apparently on large clusters it can use up a lot of memory creating the response payload. And their suggestion was to add filtering and options to reduce the size of the response. The nodes category is typically the largest part since it scales with the size of the cluster. I also opened #1363 for more granular filtering for specific stat paths but I can't get to it in time for 3.1.

Also, i think the default should be set to false

I think default has to stay as true to maintain backwards compatibility with previous versions

@q-andy
Copy link
Contributor Author

q-andy commented Jun 7, 2025

I'm planning on adding additional flags to filter the all_nodes and info_categories as well, will update PR once I finish.

@will-hwang
Copy link
Contributor

should we put a cap to how many node-level stats we want to support?

It's something we should be aware of, but it's difficult to say what the hard cap would be. Memory usage of the API call should increase linearly with the number of event stats we track so there isn't an easy hard threshold we can calculate.

what would be the performance impact if we had too many nodes in a cluster?

It's a bit ambiguous, but we're primarily concerned with a situation similar to _nodes/stats of _nodes/info. I chatted with infra team and apparently on large clusters it can use up a lot of memory creating the response payload. And their suggestion was to add filtering and options to reduce the size of the response. The nodes category is typically the largest part since it scales with the size of the cluster. I also opened #1363 for more granular filtering for specific stat paths but I can't get to it in time for 3.1.

Also, i think the default should be set to false

I think default has to stay as true to maintain backwards compatibility with previous versions

shouldn't it be that if include_individual_nodes, is set to true, then we include individual node-level stats? and if include_individual_nodes is set to false, we exclude it?

@q-andy
Copy link
Contributor Author

q-andy commented Jun 9, 2025

shouldn't it be that if include_individual_nodes, is set to true, then we include individual node-level stats? and if include_individual_nodes is set to false, we exclude it?

Yes, if it's true we include it, if it's false we exclude it. Current behavior before this flag is that it's always included, so the default is that it's true

@will-hwang
Copy link
Contributor

will-hwang commented Jun 9, 2025

shouldn't it be that if include_individual_nodes, is set to true, then we include individual node-level stats? and if include_individual_nodes is set to false, we exclude it?

Yes, if it's true we include it, if it's false we exclude it. Current behavior before this flag is that it's always included, so the default is that it's true

@q-andy
you might want to change the PR description to be:
When set to false, excludes individual node-level stats from the stats response

@q-andy q-andy force-pushed the stats-exclude-individual-nodes branch from 0426b26 to d1fce9c Compare June 10, 2025 06:02
@q-andy q-andy changed the title Add include_individual_nodes param to stats API Add include_individual_nodes, include_all_nodes, include_info param to stats API Jun 10, 2025
@q-andy q-andy changed the title Add include_individual_nodes, include_all_nodes, include_info param to stats API Add include_individual_nodes, include_all_nodes, include_info params to stats API Jun 10, 2025
Copy link
Member

@junqiu-lei junqiu-lei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could we also reolve the WARN logs in EventStatName.java? I see many these of logs in tests outputs, example: https://github.com/opensearch-project/neural-search/actions/runs/15551799557/job/43783675657?pr=1360

/home/runner/work/neural-search/neural-search/build/generated/sources/delombok/java/main/org/opensearch/neuralsearch/stats/events/EventStatName.java:21: warning: no comment
    TEXT_EMBEDDING_PROCESSOR_EXECUTIONS("text_embedding_executions", "processors.ingest", EventStatType.TIMESTAMPED_EVENT_COUNTER, Version.V_3_0_0), TEXT_EMBEDDING_PROCESSOR_SKIP_EXISTING_EXECUTIONS("text_embedding_skip_existing_executions", "processors.ingest", EventStatType.TIMESTAMPED_EVENT_COUNTER, Version.V_3_1_0), TEXT_CHUNKING_PROCESSOR_EXECUTIONS("text_chunking_executions", "processors.ingest", EventStatType.TIMESTAMPED_EVENT_COUNTER, Version.V_3_1_0), TEXT_CHUNKING_FIXED_TOKEN_LENGTH_EXECUTIONS("text_chunking_fixed_token_length_executions", "processors.ingest", EventStatType.TIMESTAMPED_EVENT_COUNTER, Version.V_3_1_0), TEXT_CHUNKING_DELIMITER_EXECUTIONS("text_chunking_delimiter_executions", "processors.ingest", EventStatType.TIMESTAMPED_EVENT_COUNTER, Version.V_3_1_0), TEXT_CHUNKING_FIXED_CHAR_LENGTH_EXECUTIONS("text_chunking_fixed_char_length_executions", "processors.ingest", 

@q-andy q-andy force-pushed the stats-exclude-individual-nodes branch from d1fce9c to 5a28bdc Compare June 10, 2025 18:58
@q-andy
Copy link
Contributor Author

q-andy commented Jun 10, 2025

LGTM, could we also reolve the WARN logs in EventStatName.java? I see many these of logs in tests outputs, example: https://github.com/opensearch-project/neural-search/actions/runs/15551799557/job/43783675657?pr=1360

Sure, added in latest commit

@heemin32 heemin32 merged commit 29a86b5 into opensearch-project:main Jun 10, 2025
48 of 52 checks passed
Copy link

codecov bot commented Jun 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (eb87d03) to head (2ae814a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #1360       +/-   ##
============================================
- Coverage     80.10%       0   -80.11%     
============================================
  Files           159       0      -159     
  Lines          8223       0     -8223     
  Branches       1321       0     -1321     
============================================
- Hits           6587       0     -6587     
+ Misses         1133       0     -1133     
+ Partials        503       0      -503     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants