You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[pkg/stanza] Introduce batching logs in File consumer (#36663)
#### Description
Modifies the File consumer to emit logs in batches as opposed to sending
each log individually through the Stanza pipeline and on to the Log
Emitter.
Here are the changes introduced:
-
6b4c9fe
Changed the `Reader::ReadToEnd` method in File consumer to collect the
tokens scanned from the file into batches. At this point, the Reader
still emits each token individually, as the `emit.Callback` function
only accepts a single token.
-
c206995
Changed `emit.Callback` function signature to accept a slice of tokens
as opposed to a single token, and changed the Reader to emit a batch of
tokens in one request. At this point, the batches are still split into
individual tokens inside the `emit` function, because the Stanza
operators can only process one entry at a time.
-
aedda3a
Added `ProcessBatch` method to Stanza operators and used it in the
`emit` function. At this point, the batch of tokens is translated to a
batch of entries and passed to Log Emitter as a whole batch. The batch
is still split in the Log Emitter, which calls `consumeFunc` for each
entry in a loop.
-
13d6054
Changed the LogEmitter to add the whole batch to its buffer, as opposed
to adding entries one by one.
**Slice of entries `[]entry.Entry` vs. slice of pointers
`[]*entry.Entry`**
I considered whether the `ProcessBatch` method in the `Operator`
interface should accept a slice of structs `[]entry.Entry` or a slice of
pointers `[]*entry.Entry`. I ran some tests (similar to
#35454)
and they showed a 7-10% performance loss when using a slice of structs
vs. a slice of pointers. That's why I decided to use the slice of
pointers `[]*entry.Entry`.
#### Link to tracking issue
- Fixes#35455
#### Testing
No changes in tests. The goal is for the functionality to not change and
for performance to not decrease.
I have added a new benchmark in a separate PR
#38054
that should be helpful in assessing the performance impact of this
change.
#### Documentation
These are internal changes, no user documentation needs changing.
0 commit comments