[C++ repro] Logging lots of image looses data #9818

abey79 · 2025-04-28T09:52:46Z

User-provided C++ repro highlight loss of logged data.

Expected: all data is logged, 50 frames on the timeline
Actual: partial data, only ~33 frames

Repro repo: https://github.com/ExpertOfNil/rerun_cpp_mve
Discord thread: https://discord.com/channels/1062300748202921994/1364955449602084874
Slack thread: https://rerunio.slack.com/archives/C041NHU952S/p1745826038111429

abey79 · 2025-04-28T09:53:57Z

Adding a rec.flush_blocking() at the end does NOT fix the issue.

Adding while (true) {} does "fix" the issue (this might be indicative of a possible short-term workaround).

ExpertOfNil · 2025-04-28T13:35:29Z

Just a point of clarification, in case it is helpful: each image should be logged 4 times (representing the actual application logging various permutations), 3 prior to rr_log_pose_estimation and once more within rr_log_pose_estimation. The end result should be 200 logged images, 50 under mve/original, 50 under mve/Image1, 50 under mve/Image2, and 50 under mve/image along with a pinhole and transform. Let me know if this is not accurately represented in my repro. Otherwise, I'll update the notes to add these details.

abey79 · 2025-04-28T13:40:27Z

This is what I'm getting with the while (true) {} trick. I'm assuming that this is the "expected" state

ExpertOfNil · 2025-04-28T14:21:29Z

That looks correct! I'm assuming you used while (true) {} in main.cpp to keep the process alive?

abey79 · 2025-04-28T14:22:27Z

That looks correct! I'm assuming you used while (true) {} in main.cpp to keep the process alive?

Yes, exactly.

ExpertOfNil · 2025-04-28T15:23:52Z

I was able to reproduce completion of all logs with while (true) {}. I also killed the process after pose 40, which seemed to drop the remainder of logs.

Would it be reasonable to provide a mechanism for flushing the RecordingStream, or checking if it has been full flushed, prior to shutting down the main process?

abey79 · 2025-04-28T16:02:12Z

Would it be reasonable to provide a mechanism for flushing the RecordingStream, or checking if it has been full flushed, prior to shutting down the main process?

Well, it was very much my expectation that rec.flush_blocking() would do exactly that, but it's apparently not the case, so may well be a bug. @grtlr is currently looking into all of this and will revert as soon as we figure that out.

grtlr · 2025-05-02T10:02:49Z

Thank you so much @ExpertOfNil 🙏! Your repro pointed out a very subtle difference in our C++ SDK and made us realized that our logging level was wrong.

For the 0.23.2 release we will create a patch to warn on dataloss. We will properly address the underlying cause for this issue in the 0.24 release (it will probably lead to subtle changes to the API).

You can follow along with that line of development here:

Change flush_blocking to make flush timeout optional #9845

Until we have the full fix, I would suggest bumping up the flush_timeout_sec parameter in your connect_grpc call to a value that prevents dataloss.

### Related * Closes #9818. ### What > [!IMPORTANT] > This PR also changes the way `RecordingStream` is free'd in the C/C++ API. Before we called `stream.disconnect`, which unnecessarily replaced the current sink with a _buffered_ sink that would be immediately dropped afterwards. Not only did this cause spam in the log outputs, it also lead to race conditions upon (log) application shutdown. This PR makes it more explicit why we drop data during flushing, by bumping the log messages to `warn!`. It also improves the message by pointing the users to `flush_timeout`. We also bump the default timeout from two seconds to now 3 seconds. It's worth taking note that explicitly calling `flush_blocking` from our SDKs should be able to opt-out of this timeout, to ensure all data is sent. This will be tracked here: * #9845.

abey79 added 👀 needs triage This issue needs to be triaged by the Rerun team 🪳 bug Something isn't working 🌊 C++ API C/C++ API specific and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Apr 28, 2025

abey79 added this to the 0.23.2 milestone Apr 28, 2025

abey79 added the user-request This is a pressing issue for one of our users label Apr 28, 2025

grtlr self-assigned this Apr 28, 2025

Wumpf added the 🦟 regression A thing that used to work in an earlier release label Apr 28, 2025

grtlr mentioned this issue Apr 30, 2025

Improve and mitigate warnings around dataloss when flushing #9846

Merged

grtlr mentioned this issue Apr 30, 2025

Change flush_blocking to make flush timeout optional #9845

Open

grtlr closed this as completed in #9846 May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++ repro] Logging lots of image looses data #9818

[C++ repro] Logging lots of image looses data #9818

abey79 commented Apr 28, 2025

abey79 commented Apr 28, 2025

ExpertOfNil commented Apr 28, 2025

abey79 commented Apr 28, 2025

ExpertOfNil commented Apr 28, 2025

abey79 commented Apr 28, 2025

ExpertOfNil commented Apr 28, 2025 •

edited

Loading

abey79 commented Apr 28, 2025

grtlr commented May 2, 2025 •

edited

Loading

[C++ repro] Logging lots of image looses data #9818

[C++ repro] Logging lots of image looses data #9818

Comments

abey79 commented Apr 28, 2025

abey79 commented Apr 28, 2025

ExpertOfNil commented Apr 28, 2025

abey79 commented Apr 28, 2025

ExpertOfNil commented Apr 28, 2025

abey79 commented Apr 28, 2025

ExpertOfNil commented Apr 28, 2025 • edited Loading

abey79 commented Apr 28, 2025

grtlr commented May 2, 2025 • edited Loading

ExpertOfNil commented Apr 28, 2025 •

edited

Loading

grtlr commented May 2, 2025 •

edited

Loading