Skip to content

SHM Buffer recovery mechanishm <1.10.x> [8220] #1159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 21, 2020

Conversation

adolfomarver
Copy link
Contributor

@adolfomarver adolfomarver commented Apr 18, 2020

To merge after #1147

This PR solves SHM transport issues when there are mutiple subscribers and one of them freezes or is super slow processing messages. This could provoke the "slow" subscriber holds all the publisher buffers leading to a cotinuous publisher's segment overflow, so all subscribers stop receiving messages.

  • Changes buffer-node / buffer-data allocation structure. Now buffer-nodes are allocated (at the segment creation stage) as fixed buffer-node pool independent from the buffer-data. This way, a node never change its address.

  • The node contains an atomic validity counter so all subscribers can check whether its reference to the buffer has been invalidated. The node also contains atomic status counters to know how many subscribers are processing the buffer and in how many ports is it enqueued.

  • Implement mecanishms to invalidate buffers already enqueued or beign processed by remote subscribers, although only buffers not beign processed are invalidated in this version.

@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch 2 times, most recently from 4b78b8a to e7235b9 Compare April 18, 2020 12:41
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch from e7235b9 to dc85d77 Compare April 18, 2020 15:39
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch from dc85d77 to d941012 Compare April 18, 2020 22:47
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch from d941012 to 9dc2b9b Compare April 19, 2020 05:45
@adolfomarver adolfomarver changed the title SHM Buffer recover <1.10.x> [8220] SHM Buffer recovery mecanishm <1.10.x> [8220] Apr 19, 2020
@adolfomarver adolfomarver changed the title SHM Buffer recovery mecanishm <1.10.x> [8220] SHM Buffer recovery mechanishm <1.10.x> [8220] Apr 19, 2020
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@adolfomarver adolfomarver marked this pull request as ready for review April 19, 2020 08:29
@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch from 9dc2b9b to a1b0c83 Compare April 19, 2020 14:22
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch from a1b0c83 to 14c77a8 Compare April 20, 2020 14:46
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

@adolfomarver adolfomarver force-pushed the feature/shared-memory/transport-buffer-recover branch from 0a5e090 to b1bd8e8 Compare April 21, 2020 07:06
@richiware
Copy link
Member

Build status:

  • Linux Build Status
  • Mac Build Status
  • Windows Build Status

max_allocations * ((sizeof(BufferNode) + per_allocation_extra_size_) + per_allocation_extra_size_);
uint32_t allocation_extra_size =
sizeof(SegmentNode) + per_allocation_extra_size_ +
(max_allocations * sizeof(BufferNode)) + per_allocation_extra_size_ +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this should be the following (please explain):

Suggested change
(max_allocations * sizeof(BufferNode)) + per_allocation_extra_size_ +
(max_allocations * (sizeof(BufferNode) + per_allocation_extra_size_) +

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation algorithm used by boost consumes some extra memory by each alocation you do. That's because the internal tree used to store every buffer the allocator gives to you, is also in the shared memory segment, so not all the reserved size is free for your use. In the segment construction a test is perform to estimate how much extra memory is used per allocation (per_allocation_extra_size). So a segment is over-sized to be able to hold:

  1. The segment node allocation.
  2. The BufferNode pool allocation.
  3. at least max_allocations number of user buffers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update the comments, then

@MiguelCompany MiguelCompany merged commit cae2dd0 into 1.10.x Apr 21, 2020
@MiguelCompany MiguelCompany deleted the feature/shared-memory/transport-buffer-recover branch April 21, 2020 11:29
MiguelCompany pushed a commit that referenced this pull request Apr 21, 2020
* Refs #8219. Change BufferNode to pre-allocated pool with fixed addresses for the nodes during entire life-cycle.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. SHM Buffer invalidation implementation.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. logWarning -> logInfo when segment overflow.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. Optimization.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. Style changes.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. buffer_recover test.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. 'error:' string removed from log msg.

Signed-off-by: AdolfoMartinez <[email protected]>
MiguelCompany added a commit that referenced this pull request Apr 22, 2020
* RobustInterprocessCondition implementation (#1147)

* Refs #8212. RobustInterprocessCondition implementation.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8212. Condition tests.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8183. Bad SHM structures alignment in some platforms.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8212. nullptr check.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8212. Re-enable Liveliness tests.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8212. SHM ABI v3.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8212. Fix cmake error & set SHM_DEFAULT_TRANSPORT=OFF.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8212. FIFO strategy in condition notify.

Signed-off-by: AdolfoMartinez <[email protected]>

* SHM Buffer recovery mechanishm (#1159)

* Refs #8219. Change BufferNode to pre-allocated pool with fixed addresses for the nodes during entire life-cycle.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. SHM Buffer invalidation implementation.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. logWarning -> logInfo when segment overflow.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. Optimization.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. Style changes.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. buffer_recover test.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8219. 'error:' string removed from log msg.

Signed-off-by: AdolfoMartinez <[email protected]>

* Setting shared memory on by default.

Signed-off-by: Miguel Company <[email protected]>

* Fix build fail in ROS-CI.

Signed-off-by: AdolfoMartinez <[email protected]>

* Fix SHM uncaught exceptions.

Signed-off-by: AdolfoMartinez <[email protected]>

* Refs #8132 Fix DDS unittests to delete entities before exiting

Signed-off-by: Laura Martin <[email protected]>

* Disable PSM unittests

Signed-off-by: Laura Martin <[email protected]>

* Fix boost::interprocess::semaphore initialization.

Signed-off-by: AdolfoMartinez <[email protected]>

* Fix CXX_STANDARD on boost try_compile

Co-authored-by: AdolfoMartinez <[email protected]>
Co-authored-by: AdolfoMartinez <[email protected]>
Co-authored-by: Laura Martin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants