Skip to content

Valid processor config yields unexpected results #38926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JelleSmet-TomTom opened this issue Mar 24, 2025 · 5 comments · Fixed by #39290
Closed

Valid processor config yields unexpected results #38926

JelleSmet-TomTom opened this issue Mar 24, 2025 · 5 comments · Fixed by #39290
Assignees
Labels
bug Something isn't working priority:p1 High processor/transform Transform processor

Comments

@JelleSmet-TomTom
Copy link

JelleSmet-TomTom commented Mar 24, 2025

Component(s)

processor/transform

What happened?

Description

The transform documentation mentions in the Basic Config section that log_statements is a list consisting out of the individual expressions as strings.

This format, at least in my particular test, yields unexpected (and therefor wrong) results. It seems that for every line of the input file, a log is generated but each log has the content of the line of the input.log file

When the same configuration is converted to the advanced configuration style results are correct.

It is unclear where the difference comes from.

Steps to Reproduce

create file input.log with following content:

{"product": {"id": 1}, "proxy": {"id": 2}, "log": "first log"}
{"product": {"id": 10}, "proxy": {"id": 20}, "log": "second log"}
{"product": {"id": 10}, "proxy": {"id": 20}, "log": "third log"}

Apply the below mentioned configuration.

working processor config

Replace the processors section of the broken config with this and the result is what is expected.

processors:
  transform/log:
    error_mode: propagate
    log_statements:
      - statements:
        - merge_maps(log.cache, ParseJSON(log.body), "upsert") where IsMatch(log.body, "^\\{")
        - set(log.body, log.cache["log"])

Expected Result

output.log contains:

{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1742847852280836180",
              "body": {
                "stringValue": "first log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742847852280843519",
              "body": {
                "stringValue": "second log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742847852280844952",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]
    }
  ]
}

Actual Result

output.log contains:

{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1742846994264031175",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742846994264052451",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            },
            {
              "observedTimeUnixNano": "1742846994264054274",
              "body": {
                "stringValue": "third log"
              },
              "attributes": [
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "input.log"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]
    }
  ]
}

Collector version

v0.120.0 and latest

Environment information

Environment

official docker images

$ docker images
REPOSITORY                             TAG       IMAGE ID       CREATED       SIZE
otel/opentelemetry-collector-contrib   latest    49ab54809761   5 days ago    329MB
otel/opentelemetry-collector-contrib   0.120.0   c67f607546f4   4 weeks ago   315MB

OpenTelemetry Collector configuration

(causes the faulty results)

receivers:
  filelog:
    type: file_input
    include:
      - /input.log
    start_at: beginning

processors:
  transform/log:
    error_mode: propagate
    log_statements:
      - merge_maps(log.cache, ParseJSON(log.body), "upsert") where IsMatch(log.body, "^\\{")
      - set(log.body, log.cache["log"])

exporters:
  file/local:
    path: ./output.log

service:
  pipelines:
    logs:
      receivers:
        - filelog
      processors:
        - transform/log
      exporters:
        - file/local

Log output

n/a

Additional context

No response

@JelleSmet-TomTom JelleSmet-TomTom added bug Something isn't working needs triage New item requiring triage labels Mar 24, 2025
@github-actions github-actions bot added the processor/transform Transform processor label Mar 24, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@edmocosta
Copy link
Contributor

Hi @JelleSmet-TomTom, thank you for reporting, that's indeed an issue with the new basic configuration style and the cache feature. Differently from the advanced mode, the basic configuration style runs each statement for every telemetry data before moving to the next one, which results into the cache being overridden when the data has multiple entries. As an alternative while we don't get it fixed, please use the advanced mode instead.

@edmocosta edmocosta removed the needs triage New item requiring triage label Mar 25, 2025
@odubajDT
Copy link
Contributor

Hey, I would like to look at this issue.

@edmocosta
Copy link
Contributor

Hi @odubajDT, I'm sorry, I forget to assign this issue to myself. I've been working on a solution already and the draft still needs to be discussed and validated with maintainers. Hope you don't mind I take this one. Thanks!

@edmocosta edmocosta self-assigned this Mar 26, 2025
@odubajDT
Copy link
Contributor

Sure no issues, thanks for the info!

akshays-19 pushed a commit to akshays-19/opentelemetry-collector-contrib that referenced this issue Apr 23, 2025
…9290)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR removes the shared cache logic we introduced to support Basic
Config statements, and changes the approach to group all Basic config
statements together into the same sharing-cache
`common.ContextStatements`, embracing the limitation of not being able
to infer the context depending on the user's statements (similar to the
advanced config).
To help with that limitation, the context inferrer validation was
improved, and it now returns specific errors that better describe why it
couldn't infer a valid context.

This PR also introduces a **breaking change**, that consists in forcing
users to use either basic or advanced configuration, for example, mixed
configurations like the following won't be valid anymore, and it will
require users to use only one style:

```yaml
log_statements:
 - set(resource.attributes["foo"], "foo")
 - statements:
   - set(resource.attributes["bar"], "bar")
```

---
Another PR will be open for removing the `WithCache` option from all
OTTL contexts.


<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
open-telemetry#38926

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Unit tests

<!--Describe the documentation added.-->
#### Documentation
Updated README 

<!--Please delete paragraphs that you did not use before submitting.-->
Fiery-Fenix pushed a commit to Fiery-Fenix/opentelemetry-collector-contrib that referenced this issue Apr 24, 2025
…9290)

<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

This PR removes the shared cache logic we introduced to support Basic
Config statements, and changes the approach to group all Basic config
statements together into the same sharing-cache
`common.ContextStatements`, embracing the limitation of not being able
to infer the context depending on the user's statements (similar to the
advanced config).
To help with that limitation, the context inferrer validation was
improved, and it now returns specific errors that better describe why it
couldn't infer a valid context.

This PR also introduces a **breaking change**, that consists in forcing
users to use either basic or advanced configuration, for example, mixed
configurations like the following won't be valid anymore, and it will
require users to use only one style:

```yaml
log_statements:
 - set(resource.attributes["foo"], "foo")
 - statements:
   - set(resource.attributes["bar"], "bar")
```

---
Another PR will be open for removing the `WithCache` option from all
OTTL contexts.


<!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. -->
#### Link to tracking issue
Fixes
open-telemetry#38926

<!--Describe what testing was performed and which tests were added.-->
#### Testing
Unit tests

<!--Describe the documentation added.-->
#### Documentation
Updated README 

<!--Please delete paragraphs that you did not use before submitting.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p1 High processor/transform Transform processor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants