Skip to content

Commit 90935ce

Browse files
authored
[pkg/stanza] Add container operator parser (open-telemetry#32594)
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR implements the new container logs parser as it was proposed at open-telemetry#31959. **Link to tracking Issue:** <Issue number if applicable> open-telemetry#31959 **Testing:** <Describe what testing was performed and which tests were added.> Added unit tests. Providing manual testing steps as well: ### How to test this manually 1. Using the following config file: ```yaml receivers: filelog: start_at: end include_file_name: false include_file_path: true include: - /var/log/pods/*/*/*.log operators: - id: container-parser type: container output: m1 - type: move id: m1 from: attributes.k8s.pod.name to: attributes.val - id: some type: add field: attributes.key2.key_in value: val2 exporters: debug: verbosity: detailed service: pipelines: logs: receivers: [filelog] exporters: [debug] processors: [] ``` 2. Start the collector: `./bin/otelcontribcol_linux_amd64 --config ~/otelcol/container_parser/config.yaml` 3. Use the following bash script to create some logs: ```bash #! /bin/bash echo '2024-04-13T07:59:37.505201169-05:00 stdout P This is a very very long crio line th' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log echo '{"log":"INFO: log line here","stream":"stdout","time":"2029-03-30T08:31:20.545192187Z"}' >> /var/log/pods/kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log echo '2024-04-13T07:59:37.505201169-05:00 stdout F at is awesome! crio is awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log echo '2021-06-22T10:27:25.813799277Z stdout P some containerd log th' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log echo '{"log":"INFO: another log line here","stream":"stdout","time":"2029-03-30T08:31:20.545192187Z"}' >> /var/log/pods/kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log echo '2021-06-22T10:27:25.813799277Z stdout F at is super awesome! Containerd is awesome' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log echo '2024-04-13T07:59:37.505201169-05:00 stdout F standalone crio line which is awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log echo '2021-06-22T10:27:25.813799277Z stdout F standalone containerd line that is super awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log ``` 4. Run the above as a bash script to verify any parallel processing. Verify that the output is correct. ### Test manually on k8s 1. `make docker-otelcontribcol && docker tag otelcontribcol otelcontribcol-dev:0.0.1 && kind load docker-image otelcontribcol-dev:0.0.1` 2. Install using the following helm values file: ```yaml mode: daemonset presets: logsCollection: enabled: true image: repository: otelcontribcol-dev tag: "0.0.1" pullPolicy: IfNotPresent command: name: otelcontribcol config: exporters: debug: verbosity: detailed receivers: filelog: start_at: end include_file_name: false include_file_path: true exclude: - /var/log/pods/default_daemonset-opentelemetry-collector*_*/opentelemetry-collector/*.log include: - /var/log/pods/*/*/*.log operators: - id: container-parser type: container output: some - id: some type: add field: attributes.key2.key_in value: val2 service: pipelines: logs: receivers: [filelog] processors: [batch] exporters: [debug] ``` 3. Check collector's output to verify the logs are parsed properly: ```console 2024-05-10T07:52:02.307Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2} 2024-05-10T07:52:02.307Z info ResourceLog #0 Resource SchemaURL: ScopeLogs #0 ScopeLogs SchemaURL: InstrumentationScope LogRecord #0 ObservedTimestamp: 2024-05-10 07:52:02.046236071 +0000 UTC Timestamp: 2024-05-10 07:52:01.92533954 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str(otel logs at 07:52:01) Attributes: -> log: Map({"iostream":"stdout"}) -> time: Str(2024-05-10T07:52:01.92533954Z) -> k8s: Map({"container":{"name":"busybox","restart_count":"0"},"namespace":{"name":"default"},"pod":{"name":"daemonset-logs-6f6mn","uid":"1069e46b-03b2-4532-a71f-aaec06c0197b"}}) -> logtag: Str(F) -> key2: Map({"key_in":"val2"}) -> log.file.path: Str(/var/log/pods/default_daemonset-logs-6f6mn_1069e46b-03b2-4532-a71f-aaec06c0197b/busybox/0.log) Trace ID: Span ID: Flags: 0 LogRecord #1 ObservedTimestamp: 2024-05-10 07:52:02.046411602 +0000 UTC Timestamp: 2024-05-10 07:52:02.027386192 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str(otel logs at 07:52:02) Attributes: -> log.file.path: Str(/var/log/pods/default_daemonset-logs-6f6mn_1069e46b-03b2-4532-a71f-aaec06c0197b/busybox/0.log) -> time: Str(2024-05-10T07:52:02.027386192Z) -> log: Map({"iostream":"stdout"}) -> logtag: Str(F) -> k8s: Map({"container":{"name":"busybox","restart_count":"0"},"namespace":{"name":"default"},"pod":{"name":"daemonset-logs-6f6mn","uid":"1069e46b-03b2-4532-a71f-aaec06c0197b"}}) -> key2: Map({"key_in":"val2"}) Trace ID: Span ID: Flags: 0 ... ``` **Documentation:** <Describe the documentation added.> Added Signed-off-by: ChrsMark <[email protected]>
1 parent c6a6bd4 commit 90935ce

File tree

11 files changed

+1306
-14
lines changed

11 files changed

+1306
-14
lines changed

.chloggen/add_container_parser.yaml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: enhancement
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
7+
component: filelogreceiver
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: Add container operator parser
11+
12+
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
13+
issues: [31959]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext:
19+
20+
# If your change doesn't affect end users or the exported elements of any package,
21+
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
22+
# Optional: The change log or logs in which this entry should be included.
23+
# e.g. '[user]' or '[user, api]'
24+
# Include 'user' if the change is relevant to end users.
25+
# Include 'api' if there is a change to a library API.
26+
# Default: '[user]'
27+
change_logs: [user]

pkg/stanza/adapter/register.go

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ package adapter // import "github.com/open-telemetry/opentelemetry-collector-con
66
import (
77
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/output/file" // Register parsers and transformers for stanza-based log receivers
88
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/output/stdout"
9+
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/container"
910
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/csv"
1011
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/json"
1112
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/jsonarray"
+238
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
## `container` operator
2+
3+
The `container` operator parses logs in `docker`, `cri-o` and `containerd` formats.
4+
5+
### Configuration Fields
6+
7+
| Field | Default | Description |
8+
|------------------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
9+
| `id` | `container` | A unique identifier for the operator. |
10+
| `format` | `` | The container log format to use if it is known. Users can choose between `docker`, `crio` and `containerd`. If not set, the format will be automatically detected. |
11+
| `add_metadata_from_filepath` | `true` | Set if k8s metadata should be added from the file path. Requires the `log.file.path` field to be present. |
12+
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
13+
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
14+
| `parse_to` | `attributes` | The [field](../types/field.md) to which the value will be parsed. |
15+
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
16+
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
17+
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |
18+
19+
20+
### Embedded Operations
21+
22+
The `container` parser can be configured to embed certain operations such as the severity parsing. For more information, see [complex parsers](../types/parsers.md#complex-parsers).
23+
24+
### Add metadata from file path
25+
26+
Requires `include_file_path: true` in order for the `log.file.path` field to be available for the operator.
27+
If that's not possible, users can disable the metadata addition with `add_metadata_from_filepath: false`.
28+
A file path like `"/var/log/pods/some-ns_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"`,
29+
will produce the following k8s metadata:
30+
31+
```json
32+
{
33+
"attributes": {
34+
"k8s": {
35+
"container": {
36+
"name": "kube-controller",
37+
"restart_count": "1"
38+
}, "pod": {
39+
"uid": "49cc7c1fd3702c40b2686ea7486091d6",
40+
"name": "kube-controller-kind-control-plane"
41+
}, "namespace": {
42+
"name": "some-ns"
43+
}
44+
}
45+
}
46+
}
47+
```
48+
49+
### Example Configurations:
50+
51+
#### Parse the body as docker container log
52+
53+
Configuration:
54+
```yaml
55+
- type: container
56+
format: docker
57+
add_metadata_from_filepath: true
58+
```
59+
60+
Note: in this example the `format: docker` is optional since formats can be automatically detected as well.
61+
`add_metadata_from_filepath` is true by default as well.
62+
63+
<table>
64+
<tr><td> Input body </td> <td> Output body</td></tr>
65+
<tr>
66+
<td>
67+
68+
```json
69+
{
70+
"timestamp": "",
71+
"body": "{\"log\":\"INFO: log line here\",\"stream\":\"stdout\",\"time\":\"2029-03-30T08:31:20.545192187Z\"}",
72+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
73+
}
74+
```
75+
76+
</td>
77+
<td>
78+
79+
```json
80+
{
81+
"timestamp": "2024-03-30 08:31:20.545192187 +0000 UTC",
82+
"body": "log line here",
83+
"attributes": {
84+
"time": "2024-03-30T08:31:20.545192187Z",
85+
"log.iostream": "stdout",
86+
"k8s.pod.name": "kube-controller-kind-control-plane",
87+
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
88+
"k8s.container.name": "kube-controller",
89+
"k8s.container.restart_count": "1",
90+
"k8s.namespace.name": "some",
91+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
92+
}
93+
}
94+
```
95+
96+
</td>
97+
</tr>
98+
</table>
99+
100+
#### Parse the body as cri-o container log
101+
102+
Configuration:
103+
```yaml
104+
- type: container
105+
```
106+
107+
<table>
108+
<tr><td> Input body </td> <td> Output body</td></tr>
109+
<tr>
110+
<td>
111+
112+
```json
113+
{
114+
"timestamp": "",
115+
"body": "2024-04-13T07:59:37.505201169-05:00 stdout F standalone crio line which is awesome",
116+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
117+
}
118+
```
119+
120+
</td>
121+
<td>
122+
123+
```json
124+
{
125+
"timestamp": "2024-04-13 12:59:37.505201169 +0000 UTC",
126+
"body": "standalone crio line which is awesome",
127+
"attributes": {
128+
"time": "2024-04-13T07:59:37.505201169-05:00",
129+
"logtag": "F",
130+
"log.iostream": "stdout",
131+
"k8s.pod.name": "kube-controller-kind-control-plane",
132+
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
133+
"k8s.container.name": "kube-controller",
134+
"k8s.container.restart_count": "1",
135+
"k8s.namespace.name": "some",
136+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
137+
}
138+
}
139+
```
140+
141+
</td>
142+
</tr>
143+
</table>
144+
145+
#### Parse the body as containerd container log
146+
147+
Configuration:
148+
```yaml
149+
- type: container
150+
```
151+
152+
<table>
153+
<tr><td> Input body </td> <td> Output body</td></tr>
154+
<tr>
155+
<td>
156+
157+
```json
158+
{
159+
"timestamp": "",
160+
"body": "2023-06-22T10:27:25.813799277Z stdout F standalone containerd line that is super awesome",
161+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
162+
}
163+
```
164+
165+
</td>
166+
<td>
167+
168+
```json
169+
{
170+
"timestamp": "2023-06-22 10:27:25.813799277 +0000 UTC",
171+
"body": "standalone containerd line that is super awesome",
172+
"attributes": {
173+
"time": "2023-06-22T10:27:25.813799277Z",
174+
"logtag": "F",
175+
"log.iostream": "stdout",
176+
"k8s.pod.name": "kube-controller-kind-control-plane",
177+
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
178+
"k8s.container.name": "kube-controller",
179+
"k8s.container.restart_count": "1",
180+
"k8s.namespace.name": "some",
181+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
182+
}
183+
}
184+
```
185+
186+
</td>
187+
</tr>
188+
</table>
189+
190+
#### Parse the multiline as containerd container log and recombine into a single one
191+
192+
Configuration:
193+
```yaml
194+
- type: container
195+
```
196+
197+
<table>
198+
<tr><td> Input body </td> <td> Output body</td></tr>
199+
<tr>
200+
<td>
201+
202+
```json
203+
{
204+
"timestamp": "",
205+
"body": "2023-06-22T10:27:25.813799277Z stdout P multiline containerd line that i",
206+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
207+
},
208+
{
209+
"timestamp": "",
210+
"body": "2023-06-22T10:27:25.813799277Z stdout F s super awesomne",
211+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
212+
}
213+
```
214+
215+
</td>
216+
<td>
217+
218+
```json
219+
{
220+
"timestamp": "2023-06-22 10:27:25.813799277 +0000 UTC",
221+
"body": "multiline containerd line that is super awesome",
222+
"attributes": {
223+
"time": "2023-06-22T10:27:25.813799277Z",
224+
"logtag": "F",
225+
"log.iostream": "stdout",
226+
"k8s.pod.name": "kube-controller-kind-control-plane",
227+
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
228+
"k8s.container.name": "kube-controller",
229+
"k8s.container.restart_count": "1",
230+
"k8s.namespace.name": "some",
231+
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
232+
}
233+
}
234+
```
235+
236+
</td>
237+
</tr>
238+
</table>

pkg/stanza/operator/helper/regexp.go

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
// Copyright The OpenTelemetry Authors
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package helper // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper"
5+
6+
import (
7+
"fmt"
8+
"regexp"
9+
)
10+
11+
func MatchValues(value string, regexp *regexp.Regexp) (map[string]any, error) {
12+
matches := regexp.FindStringSubmatch(value)
13+
if matches == nil {
14+
return nil, fmt.Errorf("regex pattern does not match")
15+
}
16+
17+
parsedValues := map[string]any{}
18+
for i, subexp := range regexp.SubexpNames() {
19+
if i == 0 {
20+
// Skip whole match
21+
continue
22+
}
23+
if subexp != "" {
24+
parsedValues[subexp] = matches[i]
25+
}
26+
}
27+
return parsedValues, nil
28+
}

0 commit comments

Comments
 (0)