-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[pkg/stanza][fileconsumer] - finalize archive's implementation #39901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[pkg/stanza][fileconsumer] - finalize archive's implementation #39901
Conversation
91f3ea8
to
496796f
Compare
496796f
to
ecce69e
Compare
@djaglowski @andrzej-stencel Looking forward to hear your thoughts on this approach 😄 |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
cc: @andrzej-stencel |
Co-authored-by: Andrzej Stencel <[email protected]>
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some last comments on the docs and we should be good to go.
receiver/filelogreceiver/README.md
Outdated
@@ -65,6 +65,7 @@ Tails and parses logs from files. | |||
| `ordering_criteria.sort_by.format` | | Relevant if `sort_type` is set to `timestamp`. Defines the strptime format of the timestamp being sorted. | | |||
| `ordering_criteria.sort_by.ascending` | | Sort direction | | |||
| `compression` | | Indicate the compression format of input files. If set accordingly, files will be read using a reader that uncompresses the file before scanning its content. Options are ``, `gzip`, or `auto`. `auto` auto-detects file compression type. Currently, gzip files are the only compressed files auto-detected, based on ".gz" filename extension. `auto` option is useful when ingesting a mix of compressed and uncompressed files with the same filelogreceiver. | | |||
| `polls_to_archive` | | This settings control the number of poll cycles to store on disk, rather than being discarded. By default, the receiver will purge the record of readers for existed for 3 generations. Refer [archiving](#archiving) and [polling](../../pkg/stanza/fileconsumer/design.md#polling) for more details. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should mention what the default value is.
@@ -221,6 +222,10 @@ Here is some of the information the file log receiver stores: | |||
|
|||
Exactly how this information is serialized depends on the type of storage being used. | |||
|
|||
### Archiving | |||
|
|||
If `polls_to_archive` setting is used in conjunction with `storage` setting, file offsets older than three poll cycles are stored on disk rather than being discarded. This feature enables the receiver to remember file for a longer period and also aims to use limited amount of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when polls_to_archive
is used but storage
is not set? Should describe this in the docs too.
@@ -221,6 +222,10 @@ Here is some of the information the file log receiver stores: | |||
|
|||
Exactly how this information is serialized depends on the type of storage being used. | |||
|
|||
### Archiving | |||
|
|||
If `polls_to_archive` setting is used in conjunction with `storage` setting, file offsets older than three poll cycles are stored on disk rather than being discarded. This feature enables the receiver to remember file for a longer period and also aims to use limited amount of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my view, this documentation does not explain to the user why they should use it. Can you describe a specific scenario when using this option changes the behavior of the receiver to the benefit of the user? This section should describe the scenario and what the behavior is without and with this option set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrzej-stencel I've added a scenario. Can you take a look?
receiver/filelogreceiver/README.md
Outdated
@@ -65,6 +65,7 @@ Tails and parses logs from files. | |||
| `ordering_criteria.sort_by.format` | | Relevant if `sort_type` is set to `timestamp`. Defines the strptime format of the timestamp being sorted. | | |||
| `ordering_criteria.sort_by.ascending` | | Sort direction | | |||
| `compression` | | Indicate the compression format of input files. If set accordingly, files will be read using a reader that uncompresses the file before scanning its content. Options are ``, `gzip`, or `auto`. `auto` auto-detects file compression type. Currently, gzip files are the only compressed files auto-detected, based on ".gz" filename extension. `auto` option is useful when ingesting a mix of compressed and uncompressed files with the same filelogreceiver. | | |||
| `polls_to_archive` | | This settings control the number of poll cycles to store on disk, rather than being discarded. By default, the receiver will purge the record of readers for existed for 3 generations. Refer [archiving](#archiving) and [polling](../../pkg/stanza/fileconsumer/design.md#polling) for more details. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark this option as experimental, so that we can possibly change it without breaking the beta stability guarantees of the component?
I feel this implementation may be not the ultimate solution and would like to reserve the right to change it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can mark this experimental.
Just to confirm, you are suggesting we use this feature under a feature gate. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm no, I wasn't thinking about a feature gate. This feature can be available for use as is, let's just mention in the docs that it's experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay got it. Thanks!!
This PR finalizes the archive implementation and adds test case.
It updates following things:
exclude_older_than
is enabled #32727 (comment) for detailed discussion on this topic.handleUnmatchedFiles
helper to create readers once archive matching is done.Relates: #38056