Fix BatchSpanProcessor.Shutdown to wait until all spans are processed #766

vmihailenco · 2020-05-26T07:19:10Z

Currently it exits too soon - before drainQueue is finished

Aneurysm9 · 2020-05-26T14:59:30Z

Is there a test case that can be added to demonstrate the faulty behavior and ensure it isn't reintroduced?

jmacd · 2020-05-26T16:53:22Z

This might never exit, I think.

I believe it's WAI. If you shutdown the processor while there are still spans being generated, some of them will not be recorded. What are the semantics you want, here?

vmihailenco · 2020-05-27T05:35:04Z

I would like to be able to understand what is "working as intented" here and how it might never exit. That is not first time I hear that from you so I guess I should point out that it does not help in any way except politely says "no" without any explanation (keep guessing).

If you shutdown the processor while there are still spans being generated, some of them will not be recorded

Given that after shutdown spans are just not added to the internal queue (channel) I don't see what to discuss here. You are right - they are discarded.

What are the semantics you want, here?

I am not particularly interested in any behavior here. But the code does not do what the comment says and what common sense suggests. Perhaps it makes sense to update the comment. Closing since it is WAI and I don't use this.

jmacd · 2020-05-27T05:53:12Z

I could be wrong, but it looked like drainQueue() would continue to process spans as long sufficiently many of them are generated to never fully drain the queue. I think we should expose a Flush() operation to flush the processor -- there is also a spec issue written on this topic -- that would allow a user to flush and then shutdown without losing spans.

jmacd · 2020-05-27T18:00:33Z

You're correct. It's clear that I'm not providing an effective review here, and I'm sorry. I've explained why I would not use this code (because it does not provide a way to limit encoded batch size), and that I only became involved because flaky tests were interfering with the metrics SDK development.

It worries me that we have disabled the tests for this class and are still making changes. What you point out is true, and we could either remove stopCh and drainQueue or we could make this change. I am concerned that even with this change, the existing test wouldn't pass.

The underlying problem seems to be that there is no "correct" way to shut down a processor when there are concurrent writers. I now believe that adding Flush() is the best path forward. We can write a correctness test for Flush() that ensures all spans are written, then all we need to test is that Shutdown closes channels, doesn't block indefinitely, and doesn't panic.

vmihailenco · 2020-05-28T13:49:39Z

I see at least 2 reasons why tests were flaky:

Shutdown exits before drainQueue is finished.
some tests don't use WithBlocking so spans can be dropped if scheduling is lucky/unlucky

This PR fixes both so hopefully tests should pass reliably again. And it is not like previously tests were working properly - they were passing, but not testing that code works correctly.

The underlying problem seems to be that there is no "correct" way to shut down a processor when there are concurrent writers.

Agreed. Shutdown only guarantees that spans added before the shutdown are processed. It does not help with concurrent writers - you should stop/synchronize them separately.

Tests should work fine because all spans are added before calling Shutdown and now tests use WithBlocking so spans can't be dropped when queue is full.

jmacd · 2020-05-28T20:50:48Z

The last time I wrangled with this topic, I did add WithBlocking() in the tests, but I wasn't able to make the test pass. We discussed this in the SIG call today, and I think the best we could do in terms of giving the user proper semantics, would be to add a Flush() call (with tests) and then simplify Shutdown.

linux-foundation-easycla · 2020-06-08T22:59:36Z

✅ Joshua MacDonald (00dd83d, 4b3feb6)

❌ Vladimir Mihailenco The commit (44cfb06 ,5a80a0b8612a7328026263f04cef2cadf2f4121b ,63d524e17a707d6a4fc8cab2d14e7160d595a333 ,09c277d76d6af09a23fd69b3dd8eccf107d415a3) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

jmacd · 2020-06-08T23:02:03Z

Thank you @vmihailenco, sorry for the delay, and sorry for several mistakes of mine in reviewing this code.

jmacd · 2020-06-08T23:08:40Z

@lizthegrey as our compliance officer, would you look at the error above (#766 (comment))? I can't see why the build is green given that message.

jmacd · 2020-06-08T23:22:57Z

@lizthegrey nevermind. I see you posted on another PR that we have to re-authorize the CLA.
However, this makes me think the test is broken because the build is green. Thoughts?

jmacd · 2020-06-09T22:20:44Z

This restores tests! Merging.

Fix BatchSpanProcessor.Shutdown to wait until all spans are processed

44cfb06

Currently it exits too soon - before drainQueue is finished

vmihailenco requested review from Aneurysm9, evantorrie, jmacd, lizthegrey, MrAlias and paivagustavo as code owners May 26, 2020 07:19

vmihailenco closed this May 27, 2020

jmacd reopened this May 27, 2020

vmihailenco added 3 commits May 28, 2020 16:37

Check bsp.stopCh to reliably drop span when batcher is stopped

5a80a0b

Enable tests

63d524e

Always use WithBlocking

09c277d

jmacd approved these changes Jun 8, 2020

View reviewed changes

Merge branch 'master' into master

00dd83d

Merge branch 'master' into master

4b3feb6

jmacd merged commit 7ebd7b5 into open-telemetry:master Jun 9, 2020

MrAlias linked an issue Jun 10, 2020 that may be closed by this pull request

Batch Span Processor unit test disabled #741

Closed

MrAlias mentioned this pull request Jun 10, 2020

Batch Span Processor unit test disabled #741

Closed

pellared added this to the untracked milestone Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix BatchSpanProcessor.Shutdown to wait until all spans are processed #766

Fix BatchSpanProcessor.Shutdown to wait until all spans are processed #766

vmihailenco commented May 26, 2020

Aneurysm9 commented May 26, 2020

jmacd commented May 26, 2020

vmihailenco commented May 27, 2020

jmacd commented May 27, 2020

jmacd commented May 27, 2020

vmihailenco commented May 28, 2020 •

edited

Loading

jmacd commented May 28, 2020

linux-foundation-easycla bot commented Jun 8, 2020 •

edited

Loading

jmacd commented Jun 8, 2020

jmacd commented Jun 8, 2020

jmacd commented Jun 8, 2020

jmacd commented Jun 9, 2020

Fix BatchSpanProcessor.Shutdown to wait until all spans are processed #766

Fix BatchSpanProcessor.Shutdown to wait until all spans are processed #766

Conversation

vmihailenco commented May 26, 2020

Aneurysm9 commented May 26, 2020

jmacd commented May 26, 2020

vmihailenco commented May 27, 2020

jmacd commented May 27, 2020

jmacd commented May 27, 2020

vmihailenco commented May 28, 2020 • edited Loading

jmacd commented May 28, 2020

linux-foundation-easycla bot commented Jun 8, 2020 • edited Loading

jmacd commented Jun 8, 2020

jmacd commented Jun 8, 2020

jmacd commented Jun 8, 2020

jmacd commented Jun 9, 2020

vmihailenco commented May 28, 2020 •

edited

Loading

linux-foundation-easycla bot commented Jun 8, 2020 •

edited

Loading