-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[pkg/stanza] make log emitter and entry conversion in adapter synchronous #35669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] make log emitter and entry conversion in adapter synchronous #35669
Conversation
Signed-off-by: Florian Bacher <[email protected]>
… log emitter Signed-off-by: Florian Bacher <[email protected]>
… log emitter Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
I am not very certain that the testbed benchmarks are a fair comparison for these changes, since they rely on so many external factors besides the adapter code itself. I left some suggestions about how we can update the adapter benchmarks to give us a better comparison.
Thanks for the thorough review @djaglowski and sorry for the late reply - I will go through all comments today and address them |
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduces a benchmark test for the receiver adapter which tests the performance of handling the log entries sent to the `LogEmitter`, which are then received and converted by the receiver adapter. The changes have been made based on the comment linked here: #35669 (comment) <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of #35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Added additional benchmark tests --------- Signed-off-by: Florian Bacher <[email protected]>
@andrzej-stencel One question regarding the removal of the converter: I've found that this is currently used in two places other than
In both cases it seems like the converter is used to generate test data, and can likely be replaced easily with the |
Signed-off-by: Florian Bacher <[email protected]>
I agree it should be possible to refactor those tests to remove the converter. Seems like a good followup PR. Regarding the benchmarks, do you mind running them multiple times and using
Based on the one run it looks promising but given the impact here I think we should have a more statistically significant comparison. If these are running quickly, it'd be interesting to see another order of magnitude too (not required though). |
Sure, will post the updated results soon |
Alright, here is the result of the
|
@andrzej-stencel @djaglowski I think this is ready for review now, since the removal of the converter type will be done as a follow up |
@bacherfl, this appears to show a ~60% increase in CPU consumption, but the earlier (single run) result showed just about the opposite. Just double checking, is it possible before/after were switched? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I re-ran the manual test "Run the collector binary reading a 1 GiB file" described in detail in #35454 (comment), comparing v0.112.0 to this PR. Surprisingly, the results say pretty consistently that this change decreases the time for File Log receiver to read a 1 GB file from ~17 seconds to ~14-15 seconds. This is about 15% improvement in performance!
@djaglowski this does seem a bit odd, I will double check the results again I just re-ran the tests and got the same results - However I noticed that I posted the |
Co-authored-by: Andrzej Stencel <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on @bacherfl.
The performance hit is unfortunate but it appears necessary to increase reliability (and maintainability).
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR removes the `Converter` type that was previously used mainly by the stanza receiver adapter (see #35669 (comment) for more details). Two other receivers were still using the converter to generate test data within the unit tests, so those have been adapted as well with this PR <!-- Issue number (e.g. #1234) or full URL to issue, if applicable. --> #### Link to tracking issue Follow up to #35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Adapted unit tests that were still using the converter --------- Signed-off-by: Florian Bacher <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR removes the `Converter` type that was previously used mainly by the stanza receiver adapter (see open-telemetry#35669 (comment) for more details). Two other receivers were still using the converter to generate test data within the unit tests, so those have been adapted as well with this PR <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Follow up to open-telemetry#35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Adapted unit tests that were still using the converter --------- Signed-off-by: Florian Bacher <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR introduces a benchmark test for the receiver adapter which tests the performance of handling the log entries sent to the `LogEmitter`, which are then received and converted by the receiver adapter. The changes have been made based on the comment linked here: open-telemetry#35669 (comment) <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Part of open-telemetry#35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Added additional benchmark tests --------- Signed-off-by: Florian Bacher <[email protected]>
…nous (open-telemetry#35669) <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR changes the `LogEmitter` to accept a synchronous consumer callback function for processing a batch of log entries as an alternative to sending log entry batches to a channel. The components that use the `LogEmitter` (adapter and parser) have been adapted accordingly. Also, in the case of the adapter, the log entries are converted directly, rather than sending them over a channel to the converter and receiving the converted results over a different channel. <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Fixes open-telemetry#35453 <!--Describe what testing was performed and which tests were added.--> #### Testing I did some initial performance tests using the `TestLogLargeFiles` load test to see how this change affects the performance. Below are the results: **Before the change (i.e. with async log entry batch processing)** ``` === RUN TestLogLargeFiles/filelog-largefiles-2Gb-lifetime 2024/10/08 09:02:53 | Sent:17,769,795 logs (179,507/sec) | Received:17,755,188 items (179,346/sec) === RUN TestLogLargeFiles/filelog-largefiles-6GB-lifetime 2024/10/08 09:06:29 | Sent:42,857,755 logs (216,465/sec) | Received:42,851,987 items (216,424/sec) Test |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items| ---------------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:| LogLargeFiles/filelog-largefiles-2Gb-lifetime|PASS | 100s| 73.1| 78.4| 106| 118| 18249451| 18249451| LogLargeFiles/filelog-largefiles-6GB-lifetime|PASS | 200s| 87.5| 98.1| 110| 116| 44358460| 44358460| ``` **After the change (i.e. with sync log entry batch processing)** ``` === RUN TestLogLargeFiles/filelog-largefiles-2Gb-lifetime 2024/10/08 10:09:51 Agent RAM (RES): 139 MiB, CPU:71.7% | Sent:17,802,561 logs (179,836/sec) | Received:17,788,273 items (179,680/sec) === RUN TestLogLargeFiles/filelog-largefiles-6GB-lifetime 2024/10/08 10:05:15 Agent RAM (RES): 140 MiB, CPU:95.6% | Sent:42,912,030 logs (216,744/sec) | Received:42,904,306 items (216,689/sec) Test |Result|Duration|CPU Avg%|CPU Max%|RAM Avg MiB|RAM Max MiB|Sent Items|Received Items| ---------------------------------------------|------|-------:|-------:|-------:|----------:|----------:|---------:|-------------:| LogLargeFiles/filelog-largefiles-2Gb-lifetime|PASS | 100s| 74.8| 78.9| 127| 139| 17984687| 17984687| LogLargeFiles/filelog-largefiles-6GB-lifetime|PASS | 200s| 89.3| 100.9| 134| 140| 43376210| 43376210| ``` Those results seem to indicate comparable throughput, but with an increased resource consumption, especially in terms of memory. I also did a test comparing the performance between the synchronous and asynchronous log emitter using the same methodology as in open-telemetry#35454. The results were the following, and indicate an increase in the time it takes for reading the generated log file (see open-telemetry#35454 for details on how the file is generated and the test execution): - Async Log Emitter: ~8s - Sync Log Emitter: ~12s <details> <summary>output-async.log</summary> === Step 3: Thu Oct 10 08:54:23 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 2.209674e+06 === Step 4: Thu Oct 10 08:54:25 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 5.428103e+06 === Step 5: Thu Oct 10 08:54:26 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 7.337017e+06 === Step 6: Thu Oct 10 08:54:27 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 9.258843e+06 === Step 7: Thu Oct 10 08:54:29 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.3082428e+07 === Step 8: Thu Oct 10 08:54:31 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.6519068e+07 </details> <details> <summary>output-sync.log</summary> === Step 2: Thu Oct 10 08:51:27 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.580891e+06 === Step 3: Thu Oct 10 08:51:28 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 3.01034e+06 === Step 4: Thu Oct 10 08:51:29 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 4.434627e+06 === Step 5: Thu Oct 10 08:51:31 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 7.416612e+06 === Step 6: Thu Oct 10 08:51:34 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.0496072e+07 === Step 7: Thu Oct 10 08:51:36 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.3523882e+07 === Step 8: Thu Oct 10 08:51:37 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.4929707e+07 === Step 9: Thu Oct 10 08:51:39 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.6519105e+07 </details> --------- Signed-off-by: Florian Bacher <[email protected]> Co-authored-by: Andrzej Stencel <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR removes the `Converter` type that was previously used mainly by the stanza receiver adapter (see open-telemetry#35669 (comment) for more details). Two other receivers were still using the converter to generate test data within the unit tests, so those have been adapted as well with this PR <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Follow up to open-telemetry#35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Adapted unit tests that were still using the converter --------- Signed-off-by: Florian Bacher <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR removes the `Converter` type that was previously used mainly by the stanza receiver adapter (see open-telemetry#35669 (comment) for more details). Two other receivers were still using the converter to generate test data within the unit tests, so those have been adapted as well with this PR <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Follow up to open-telemetry#35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Adapted unit tests that were still using the converter --------- Signed-off-by: Florian Bacher <[email protected]>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> #### Description This PR removes the `Converter` type that was previously used mainly by the stanza receiver adapter (see open-telemetry#35669 (comment) for more details). Two other receivers were still using the converter to generate test data within the unit tests, so those have been adapted as well with this PR <!-- Issue number (e.g. open-telemetry#1234) or full URL to issue, if applicable. --> #### Link to tracking issue Follow up to open-telemetry#35453 <!--Describe what testing was performed and which tests were added.--> #### Testing Adapted unit tests that were still using the converter --------- Signed-off-by: Florian Bacher <[email protected]>
Description
This PR changes the
LogEmitter
to accept a synchronous consumer callback function for processing a batch of log entries as an alternative to sending log entry batches to a channel.The components that use the
LogEmitter
(adapter and parser) have been adapted accordingly. Also, in the case of the adapter, the log entries are converted directly, rather than sending them over a channel to the converter and receiving the converted results over a different channel.Link to tracking issue
Fixes #35453
Testing
I did some initial performance tests using the
TestLogLargeFiles
load test to see how this change affects the performance. Below are the results:Before the change (i.e. with async log entry batch processing)
After the change (i.e. with sync log entry batch processing)
Those results seem to indicate comparable throughput, but with an increased resource consumption, especially in terms of memory.
I also did a test comparing the performance between the synchronous and asynchronous log emitter using the same methodology as in #35454.
The results were the following, and indicate an increase in the time it takes for reading the generated log file (see #35454 for details on how the file is generated and the test execution):
output-async.log
=== Step 3: Thu Oct 10 08:54:23 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 2.209674e+06 === Step 4: Thu Oct 10 08:54:25 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 5.428103e+06 === Step 5: Thu Oct 10 08:54:26 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 7.337017e+06 === Step 6: Thu Oct 10 08:54:27 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 9.258843e+06 === Step 7: Thu Oct 10 08:54:29 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.3082428e+07 === Step 8: Thu Oct 10 08:54:31 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="3744d4cb-5080-427c-8c16-a96cb40a57d4",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.6519068e+07output-sync.log
=== Step 2: Thu Oct 10 08:51:27 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.580891e+06 === Step 3: Thu Oct 10 08:51:28 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 3.01034e+06 === Step 4: Thu Oct 10 08:51:29 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 4.434627e+06 === Step 5: Thu Oct 10 08:51:31 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 7.416612e+06 === Step 6: Thu Oct 10 08:51:34 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.0496072e+07 === Step 7: Thu Oct 10 08:51:36 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.3523882e+07 === Step 8: Thu Oct 10 08:51:37 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.4929707e+07 === Step 9: Thu Oct 10 08:51:39 CEST 2024 otelcol_receiver_accepted_log_records{receiver="filelog",service_instance_id="dcf5371b-19eb-47b3-a820-756c1832b448",service_name="otelcontribcol",service_version="0.111.0-dev",transport=""} 1.6519105e+07