-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix BatchSpanProcessor.Shutdown to wait until all spans are processed #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently it exits too soon - before drainQueue is finished
Is there a test case that can be added to demonstrate the faulty behavior and ensure it isn't reintroduced? |
This might never exit, I think. I believe it's WAI. If you shutdown the processor while there are still spans being generated, some of them will not be recorded. What are the semantics you want, here? |
I would like to be able to understand what is "working as intented" here and how it might never exit. That is not first time I hear that from you so I guess I should point out that it does not help in any way except politely says "no" without any explanation (keep guessing).
Given that after shutdown spans are just not added to the internal queue (channel) I don't see what to discuss here. You are right - they are discarded.
I am not particularly interested in any behavior here. But the code does not do what the comment says and what common sense suggests. Perhaps it makes sense to update the comment. Closing since it is WAI and I don't use this. |
I could be wrong, but it looked like drainQueue() would continue to process spans as long sufficiently many of them are generated to never fully drain the queue. I think we should expose a |
You're correct. It's clear that I'm not providing an effective review here, and I'm sorry. I've explained why I would not use this code (because it does not provide a way to limit encoded batch size), and that I only became involved because flaky tests were interfering with the metrics SDK development. It worries me that we have disabled the tests for this class and are still making changes. What you point out is true, and we could either remove The underlying problem seems to be that there is no "correct" way to shut down a processor when there are concurrent writers. I now believe that adding Flush() is the best path forward. We can write a correctness test for Flush() that ensures all spans are written, then all we need to test is that Shutdown closes channels, doesn't block indefinitely, and doesn't panic. |
I see at least 2 reasons why tests were flaky:
This PR fixes both so hopefully tests should pass reliably again. And it is not like previously tests were working properly - they were passing, but not testing that code works correctly.
Agreed. Tests should work fine because all spans are added before calling |
The last time I wrangled with this topic, I did add |
|
Thank you @vmihailenco, sorry for the delay, and sorry for several mistakes of mine in reviewing this code. |
@lizthegrey as our compliance officer, would you look at the error above (#766 (comment))? I can't see why the build is green given that message. |
@lizthegrey nevermind. I see you posted on another PR that we have to re-authorize the CLA. |
This restores tests! Merging. |
Currently it exits too soon - before drainQueue is finished