Skip to content

[exporter/elasticsearch] Dynamically route documents by default #38500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Mar 14, 2025

Conversation

carsonip
Copy link
Contributor

@carsonip carsonip commented Mar 10, 2025

Description

Breaking change.

Overhaul in document routing. New document routing logic:

Documents are statically or dynamically routed to the target index / data stream in the following order. The first routing mode that applies will be used.
1. "Static mode": Route to `logs_index` for log records, `metrics_index` for data points and `traces_index` for spans, if these configs are not empty respectively. [^3]
2. "Dynamic - Index attribute mode": Route to index name specified in `elasticsearch.index` attribute (precedence: log record / data point / span attribute > scope attribute > resource attribute) if the attribute exists. [^3]
3. "Dynamic - Data stream routing mode": Route to data stream constructed from `${data_stream.type}-${data_stream.dataset}-${data_stream.namespace}`,
where `data_stream.type` is `logs` for log records, `metrics` for data points, and `traces` for spans, and is static. [^3]
In a special case with `mapping::mode: bodymap`, `data_stream.type` field (valid values: `logs`, `metrics`) can be dynamically set from attributes.
The resulting documents will contain the corresponding `data_stream.*` fields, see restrictions applied to [Data Stream Fields](https://www.elastic.co/guide/en/ecs/current/ecs-data_stream.html).
   1. `data_stream.dataset` or `data_stream.namespace` in attributes (precedence: log record / data point / span attribute > scope attribute > resource attribute)
   2. Otherwise, if scope name matches regex `/receiver/(\w*receiver)`, `data_stream.dataset` will be capture group #1
   3. Otherwise, `data_stream.dataset` falls back to `generic` and `data_stream.namespace` falls back to `default`. 
In OTel mapping mode (`mapping::mode: otel`), there is special handling in addition to the above document routing rules in [Elasticsearch document routing](#elasticsearch-document-routing).
The order to determine the routing mode is the same as [Elasticsearch document routing](#elasticsearch-document-routing).

1. "Static mode": Span events are separate documents routed to `logs_index` if non-empty.
2. "Dynamic - Index attribute mode": Span events are separate documents routed using attribute `elasticsearch.index` (precedence: span event attribute > scope attribute > resource attribute) if the attribute exists.
3. "Dynamic - Data stream routing mode":
  - For all documents, `data_stream.dataset` will always be appended with `.otel`.
  - A special case to (3)(1) in [Elasticsearch document routing](#elasticsearch-document-routing), span events are separate documents that have `data_stream.type: logs` and are routed using data stream attributes (precedence: span event attribute > scope attribute > resource attribute)

Effective changes:

  • Deprecate and make {logs,metrics,traces}_dynamic_index config no-op
  • Config validation error on {logs,metrics,traces}_dynamic_index::enabled and {logs,metrics,traces}_index set at the same time, as users who rely on dynamic index should not set {logs,metrics,traces}_index.
  • Remove elasticsearch.index.{prefix,suffix} handling. Replace it with elasticsearch.index handling that uses attribute value as index directly. Users rely on the previously supported elasticsearch.index.prefix and elasticsearch.index.suffix should migrate to a transform processor that sets elasticsearch.index.
  • Fix a bug where receiver-based routing overwrites data_stream.dataset.

Should be released together with #38458

Link to tracking issue

Fixes #38361

Testing

Documentation

Telemetry data will be written to signal specific data streams by default:
logs to `logs-generic-default`, metrics to `metrics-generic-default`, and traces to `traces-generic-default`.
Documents are routed to the target index / data stream dynamically in the following order. The first routing mode that applies will be used.
1. "Static mode": To `logs_index` (for log records), `metrics_index` (for data points) and `traces_index` (for spans and span events) if they are configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use logs_index for span events? Or maybe add another option for span events?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't intuitive to have logs_index as the configuration of destination of span events. What about a new config traces_span_events_index (that is only applicable to otel mapping mode as it sends span events as separate documents)? It should default to empty, such that dynamic routing kicks in.

I have also considered

  • using logs_index for span events. See reasoning below.
  • setting the default of traces_span_events_index as logs_index, but it seems to a big trap. Let's imagine if a user wants logs to be statically routed to an logs_index and traces dynamically routed. If the user sets logs_index, span events suddenly are not dynamically routed. This is not great.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we model span events as logs in OTel mode, I think it's only logical to use logs_index. Ack that this may not be intuitive but documenting this seems like the solution. I think we should start simple by just using logs_index and only adding something like traces_span_events_index if necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm. It is explicitly stated that it is not recommended to set logs_index, metrics_index and traces_index anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 61a3bfa

@carsonip carsonip changed the title [WIP][exporter/elasticsearch] Always dynamically route documents [WIP][exporter/elasticsearch] Dynamically route documents by default Mar 11, 2025
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is a nice simplification.

Telemetry data will be written to signal specific data streams by default:
logs to `logs-generic-default`, metrics to `metrics-generic-default`, and traces to `traces-generic-default`.
Documents are statically or dynamically routed to the target index / data stream in the following order. The first routing mode that applies will be used.
1. "Static mode": To `logs_index` for log records, `metrics_index` for data points and `traces_index` for spans, if these configs are not empty respectively. In OTel mapping mode (`mapping::mode: otel`), span events are separate documents routed to `logs_index` if non-empty.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps instead of repeating the statement about otel mode span events, add a paragraph after the numbered list mentioning that in otel mode, span events are considered log records and routed as such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 769edb2

songy23 pushed a commit that referenced this pull request Mar 12, 2025
…8458)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description
Breaking change: change default `mapping::mode` config to `otel` for the
best user experience and the most intuitive document structure in
Elasticsearch. See README to learn more about otel mapping mode. To
retain the old behavior, explicitly set `mapping::mode` to `none`.

Should be released together with
#38500

<!-- Issue number (e.g. #1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes #37241

<!--Describe the documentation added.-->
#### Documentation
Updated README
@andrzej-stencel
Copy link
Member

@JaredTan95 any thoughts on it?

Copy link
Member

@JaredTan95 JaredTan95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, dynamic routing is more flexible and logically consistent

@edmocosta edmocosta added the ready to merge Code review completed; ready to merge by maintainers label Mar 14, 2025
@songy23 songy23 merged commit 662feae into open-telemetry:main Mar 14, 2025
180 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/elasticsearch ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[exporter/elasticsearch] Enable dynamic routing (previously {logs,metrics,traces}_dynamic_index) by default
8 participants