updating standard analyzer docs #9747

AntonEliatra · 2025-04-28T11:16:25Z

Description

updating standard analyzer docs

Version

all

Checklist

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anton Rubin <[email protected]>

github-actions · 2025-04-28T11:16:35Z

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

_analyzers/supported-analyzers/standard.md

Signed-off-by: kolchfa-aws <[email protected]>

kolchfa-aws · 2025-04-28T13:23:54Z

@udabhas Could you please review this PR? Thanks!

_analyzers/supported-analyzers/standard.md

sandeshkr419 · 2025-06-25T20:47:51Z

_analyzers/supported-analyzers/standard.md


+| Parameter | Type | Default | Description |


Most of the documentation pages use Data type instead of Type. Eg: https://github.com/opensearch-project/documentation-website/pull/9479/files

Let's stick to a single nomenclature across documentations - either type or data type - either is fine, as long as it is consistent.

@sandeshkr419 thats updated now across the repo, All "Data Type" changed to "Data type"

Signed-off-by: Anton Rubin <[email protected]>

Signed-off-by: AntonEliatra <[email protected]>

AntonEliatra · 2025-06-30T19:16:56Z

@sandeshkr419 I think this addressed all the points, could you double check please?

kolchfa-aws

Thank you, @AntonEliatra! Please see my comments and let me know if you have any questions.

kolchfa-aws · 2025-07-02T16:01:15Z

_analyzers/supported-analyzers/standard.md

- `standard` tokenizer: Removes most punctuation and splits text on spaces and other common delimiters.
- `lowercase` token filter: Converts all tokens to lowercase, ensuring case-insensitive matching.
- `stop` token filter: Removes common stopwords, such as "the", "is", and "and", from the tokenized output.
+- **Tokenization**: It uses the [`standard`]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/standard/) tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.


Suggested change

- **Tokenization**: It uses the [`standard`]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/standard/) tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.

- **Tokenization**: Uses the [`standard`]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/standard/) tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.

kolchfa-aws · 2025-07-02T16:01:29Z

_analyzers/supported-analyzers/standard.md

- `lowercase` token filter: Converts all tokens to lowercase, ensuring case-insensitive matching.
- `stop` token filter: Removes common stopwords, such as "the", "is", and "and", from the tokenized output.
+- **Tokenization**: It uses the [`standard`]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/standard/) tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.
+- **Lowercasing**: It applies the [`lowercase`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/lowercase/) token filter to convert all tokens to lowercase, ensuring consistent matching regardless of input case.


Suggested change

- **Lowercasing**: It applies the [`lowercase`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/lowercase/) token filter to convert all tokens to lowercase, ensuring consistent matching regardless of input case.

- **Lowercasing**: Applies the [`lowercase`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/lowercase/) token filter to convert all tokens to lowercase, ensuring consistent matching regardless of input case.

kolchfa-aws · 2025-07-02T16:02:27Z

_analyzers/supported-analyzers/standard.md


-Use the following command to create an index named `my_standard_index` with a `standard` analyzer:
+---


Suggested change

---

kolchfa-aws · 2025-07-02T16:02:41Z

_analyzers/supported-analyzers/standard.md

      }
    }
  }
 }
 ```
 {% include copy-curl.html %}

+---


Suggested change

---

kolchfa-aws · 2025-07-02T16:02:52Z

_analyzers/supported-analyzers/standard.md

 ## Parameters

-You can configure a `standard` analyzer with the following parameters.
+The `standard` analyzer supports the following optional parameters:


Suggested change

The `standard` analyzer supports the following optional parameters:

The `standard` analyzer supports the following optional parameters.

kolchfa-aws · 2025-07-02T16:04:57Z

_analyzers/supported-analyzers/standard.md


-Use the following command to configure an index with a custom analyzer that is equivalent to the `standard` analyzer:
+The following example creates index `products` and configures `max_token_length` and `stopwords`:


Suggested change

The following example creates index `products` and configures `max_token_length` and `stopwords`:

The following example creates a `products` index and configures the `max_token_length` and `stopwords` parameters:

kolchfa-aws · 2025-07-02T16:05:21Z

_analyzers/supported-analyzers/standard.md

-## Generated tokens
-
-Use the following request to examine the tokens generated using the analyzer:
+Use the following `_analyze` API to see how the `my_manual_stopwords_analyzer` processes text:


Suggested change

Use the following `_analyze` API to see how the `my_manual_stopwords_analyzer` processes text:

Use the following `_analyze` API request to see how the `my_manual_stopwords_analyzer` processes text:

kolchfa-aws · 2025-07-02T16:05:43Z

_analyzers/supported-analyzers/standard.md

-The response contains the generated tokens:
+The returned tokens are:
+
+- separated based on spacing


Suggested change

- separated based on spacing

- Split on spaces

kolchfa-aws · 2025-07-02T16:05:50Z

_analyzers/supported-analyzers/standard.md

+The returned tokens are:
+
+- separated based on spacing
+- lowercased


Suggested change

- lowercased

- Lowercased

kolchfa-aws · 2025-07-02T16:05:59Z

_analyzers/supported-analyzers/standard.md

+
+- separated based on spacing
+- lowercased
+- stopwords removed


Suggested change

- stopwords removed

- Stopwords removed

updating standard analyzer docs

79982b0

Signed-off-by: Anton Rubin <[email protected]>

AntonEliatra requested review from kolchfa-aws, Naarcha-AWS, AMoo-Miki, natebower, dlvenable and epugh as code owners April 28, 2025 11:16

github-actions bot assigned kolchfa-aws Apr 28, 2025

kolchfa-aws reviewed Apr 28, 2025

View reviewed changes

_analyzers/supported-analyzers/standard.md Outdated Show resolved Hide resolved

Update _analyzers/supported-analyzers/standard.md

72c65e1

Signed-off-by: kolchfa-aws <[email protected]>

kolchfa-aws added 3 - Tech review PR: Tech review in progress Content gap backport 2.19 labels Apr 28, 2025

kolchfa-aws added backport 3.0 and removed backport 2.19 labels May 6, 2025

sandeshkr419 suggested changes Jun 25, 2025

View reviewed changes

sandeshkr419 self-assigned this Jun 25, 2025

sandeshkr419 reviewed Jun 25, 2025

View reviewed changes

AntonEliatra added 3 commits June 26, 2025 11:58

addressing the PR comments

cfe0817

Signed-off-by: Anton Rubin <[email protected]>

replacing add Data Type with Data type

a62523c

Signed-off-by: Anton Rubin <[email protected]>

Update standard.md

cc15669

Signed-off-by: AntonEliatra <[email protected]>

sandeshkr419 approved these changes Jun 30, 2025

View reviewed changes

kolchfa-aws added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Jun 30, 2025

kolchfa-aws reviewed Jul 2, 2025

View reviewed changes

	- Tokenization: It uses the [`standard`]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/standard/) tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.
	- Tokenization: Uses the [`standard`]({{site.url}}{{site.baseurl}}/analyzers/tokenizers/standard/) tokenizer, which splits text into words based on Unicode text segmentation rules, handling spaces, punctuation, and common delimiters.

	- Lowercasing: It applies the [`lowercase`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/lowercase/) token filter to convert all tokens to lowercase, ensuring consistent matching regardless of input case.
	- Lowercasing: Applies the [`lowercase`]({{site.url}}{{site.baseurl}}/analyzers/token-filters/lowercase/) token filter to convert all tokens to lowercase, ensuring consistent matching regardless of input case.


		Use the following command to create an index named `my_standard_index` with a `standard` analyzer:
		---

	The `standard` analyzer supports the following optional parameters:
	The `standard` analyzer supports the following optional parameters.


		Use the following command to configure an index with a custom analyzer that is equivalent to the `standard` analyzer:
		The following example creates index `products` and configures `max_token_length` and `stopwords`:

	The following example creates index `products` and configures `max_token_length` and `stopwords`:
	The following example creates a `products` index and configures the `max_token_length` and `stopwords` parameters:

	Use the following `_analyze` API to see how the `my_manual_stopwords_analyzer` processes text:
	Use the following `_analyze` API request to see how the `my_manual_stopwords_analyzer` processes text:

updating standard analyzer docs #9747

Are you sure you want to change the base?

updating standard analyzer docs #9747

Uh oh!

Conversation

AntonEliatra commented Apr 28, 2025

Description

Version

Checklist

Uh oh!

github-actions bot commented Apr 28, 2025

Uh oh!

Uh oh!

kolchfa-aws commented Apr 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AntonEliatra commented Jun 30, 2025

Uh oh!

kolchfa-aws left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!