Valid processor config yields unexpected results #38926

JelleSmet-TomTom · 2025-03-24T20:34:30Z

Component(s)

processor/transform

What happened?

Description

The transform documentation mentions in the Basic Config section that log_statements is a list consisting out of the individual expressions as strings.

This format, at least in my particular test, yields unexpected (and therefor wrong) results. It seems that for every line of the input file, a log is generated but each log has the content of the line of the input.log file

When the same configuration is converted to the advanced configuration style results are correct.

It is unclear where the difference comes from.

Steps to Reproduce

create file input.log with following content:

{"product": {"id": 1}, "proxy": {"id": 2}, "log": "first log"}
{"product": {"id": 10}, "proxy": {"id": 20}, "log": "second log"}
{"product": {"id": 10}, "proxy": {"id": 20}, "log": "third log"}

Apply the below mentioned configuration.

working processor config

Replace the processors section of the broken config with this and the result is what is expected.

processors:
  transform/log:
    error_mode: propagate
    log_statements:
      - statements:
        - merge_maps(log.cache, ParseJSON(log.body), "upsert") where IsMatch(log.body, "^\\{")
        - set(log.body, log.cache["log"])

Expected Result

output.log contains:

{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1742847852280836180",
              "body": {
                "stringValue": "first log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742847852280843519",
              "body": {
                "stringValue": "second log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742847852280844952",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]
    }
  ]
}

Actual Result

output.log contains:

{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1742846994264031175",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742846994264052451",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742846994264054274",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]
    }
  ]
}

Collector version

v0.120.0 and latest

Environment information

Environment

official docker images

$ docker images
REPOSITORY                             TAG       IMAGE ID       CREATED       SIZE
otel/opentelemetry-collector-contrib   latest    49ab54809761   5 days ago    329MB
otel/opentelemetry-collector-contrib   0.120.0   c67f607546f4   4 weeks ago   315MB

OpenTelemetry Collector configuration

(causes the faulty results)

receivers:
  filelog:
    type: file_input
    include:
      - /input.log
    start_at: beginning

processors:
  transform/log:
    error_mode: propagate
    log_statements:
      - merge_maps(log.cache, ParseJSON(log.body), "upsert") where IsMatch(log.body, "^\\{")
      - set(log.body, log.cache["log"])

exporters:
  file/local:
    path: ./output.log

service:
  pipelines:
    logs:
      receivers:
        - filelog
      processors:
        - transform/log
      exporters:
        - file/local

Log output

n/a

Additional context

No response

The text was updated successfully, but these errors were encountered:

github-actions · 2025-03-24T20:41:03Z

Pinging code owners:

processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley @edmocosta

See Adding Labels via Comments if you do not have permissions to add labels yourself.

edmocosta · 2025-03-25T19:56:32Z

Hi @JelleSmet-TomTom, thank you for reporting, that's indeed an issue with the new basic configuration style and the cache feature. Differently from the advanced mode, the basic configuration style runs each statement for every telemetry data before moving to the next one, which results into the cache being overridden when the data has multiple entries. As an alternative while we don't get it fixed, please use the advanced mode instead.

odubajDT · 2025-03-26T06:29:38Z

Hey, I would like to look at this issue.

edmocosta · 2025-03-26T07:39:25Z

Hi @odubajDT, I'm sorry, I forget to assign this issue to myself. I've been working on a solution already and the draft still needs to be discussed and validated with maintainers. Hope you don't mind I take this one. Thanks!

odubajDT · 2025-03-26T07:53:07Z

Sure no issues, thanks for the info!

…9290)  #### Description This PR removes the shared cache logic we introduced to support Basic Config statements, and changes the approach to group all Basic config statements together into the same sharing-cache `common.ContextStatements`, embracing the limitation of not being able to infer the context depending on the user's statements (similar to the advanced config). To help with that limitation, the context inferrer validation was improved, and it now returns specific errors that better describe why it couldn't infer a valid context. This PR also introduces a **breaking change**, that consists in forcing users to use either basic or advanced configuration, for example, mixed configurations like the following won't be valid anymore, and it will require users to use only one style: ```yaml log_statements: - set(resource.attributes["foo"], "foo") - statements: - set(resource.attributes["bar"], "bar") ``` --- Another PR will be open for removing the `WithCache` option from all OTTL contexts.  #### Link to tracking issue Fixes open-telemetry#38926  #### Testing Unit tests  #### Documentation Updated README

JelleSmet-TomTom added bug Something isn't working needs triage New item requiring triage labels Mar 24, 2025

github-actions bot added the processor/transform Transform processor label Mar 24, 2025

github-actions bot mentioned this issue Mar 25, 2025

Weekly Report: 2025-03-18 - 2025-03-25 #38935

Closed

edmocosta removed the needs triage New item requiring triage label Mar 25, 2025

edmocosta self-assigned this Mar 26, 2025

TylerHelmuth added the priority:p1 High label Mar 31, 2025

edmocosta mentioned this issue Apr 9, 2025

[processor/transform] Fix basic config cache access #39290

Merged

TylerHelmuth closed this as completed in #39290 Apr 10, 2025

TylerHelmuth closed this as completed in 68e1d6c Apr 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Valid processor config yields unexpected results #38926

Valid processor config yields unexpected results #38926

JelleSmet-TomTom commented Mar 24, 2025 •

edited

Loading

github-actions bot commented Mar 24, 2025

edmocosta commented Mar 25, 2025

odubajDT commented Mar 26, 2025

edmocosta commented Mar 26, 2025

odubajDT commented Mar 26, 2025

Valid processor config yields unexpected results #38926

Valid processor config yields unexpected results #38926

Comments

JelleSmet-TomTom commented Mar 24, 2025 • edited Loading

Component(s)

What happened?

Description

Steps to Reproduce

working processor config

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Mar 24, 2025

edmocosta commented Mar 25, 2025

odubajDT commented Mar 26, 2025

edmocosta commented Mar 26, 2025

odubajDT commented Mar 26, 2025

JelleSmet-TomTom commented Mar 24, 2025 •

edited

Loading