Skip to content

Parallelize State call in ECM #451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Jan 15, 2021
Merged

Parallelize State call in ECM #451

merged 35 commits into from
Jan 15, 2021

Conversation

JShep1
Copy link

@JShep1 JShep1 commented Nov 6, 2020

Ready for review. Below is a plot of 25 randomly sampled PostUpdate profiled points for both before and after this PR.

The before average timestep of PostUpdate was 33.095 ms and after average time for PostUpdate is 22.479 ms in the case of the shapes_population.sdf. This PR is primarily targeted for larger example and will not significantly affect smaller examples (if there are under 50 entity components, no speedup will occur at all due to the nature of the thread spawning in this PR).

My RTF before this PR after 2 minutes of running shapes_population.sdf was 0.602% and after the PR I am getting about 0.669% RTF.

Multi-Threading Results (shapes_population sdf)

I'm going to experiment with the the number of threads and the number of components each thread handles to see if I can tune these two parameters to get a more optimized RTF. I don't think changing the max thread count to anything higher than std::thread::hardware_concurrency() (the max suggested threads that the hardware supports) would speed up anything, this would likely just bottleneck any additional threads and schedule them in when it puts other threads to sleep. Perhaps we could see some speedup with the amount of entity components each thread will run at minimum, but I'm not so sure this would make a huge difference either, it would only affect scenes in which there are between componentsPerThread and componentsPerThread * maxThreads entity components.

John Shepherd added 18 commits October 14, 2020 12:22
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
@github-actions github-actions bot added the 🏰 citadel Ignition Citadel label Nov 6, 2020
@JShep1 JShep1 changed the base branch from ign-gazebo3 to jshep1/scenebroadcaster_optimizations2 November 10, 2020 00:48
@JShep1 JShep1 changed the base branch from jshep1/scenebroadcaster_optimizations2 to ign-gazebo3 November 10, 2020 00:48
@iche033
Copy link
Contributor

iche033 commented Nov 12, 2020

Here's what I got comparing performance of the State function before (top image) and after (bottom image) the change:

threaded_state

This is from running the shapes_population.sdf world. It does look there it shaved a few ms off the State function. When running with TPE plugin, the RTF before the change is 1.4x% and after is 1.8x%.

Note careful about using remotery for profiling. I noticed that since new threads are spawned every iteration, it made remotery a bit laggy and unresponsive sometimes.

@JShep1
Copy link
Author

JShep1 commented Nov 12, 2020

Good to see that you're seeing improvements... I guess Remotery was a bit wonky for me given all the threads spawning. I'll clean up the logic and make it more generalizable and get the PR in for review soon.

@iche033
Copy link
Contributor

iche033 commented Nov 12, 2020

by the way, I wonder if it's possible to reuse threads, e.g. store in member variable?, and have them wait on a condition_variable until there's work to do.

@JShep1
Copy link
Author

JShep1 commented Nov 12, 2020

by the way, I wonder if it's possible to reuse threads, e.g. store in member variable?, and have them wait on a condition_variable until there's work to do.

That's also what I was thinking and was going to attempt something like that. It would certainly reduce noise in Remotery as well as cut down any overhead from spawning so many threads every iteration.

@JShep1
Copy link
Author

JShep1 commented Nov 18, 2020

by the way, I wonder if it's possible to reuse threads, e.g. store in member variable?, and have them wait on a condition_variable until there's work to do.

That's also what I was thinking and was going to attempt something like that. It would certainly reduce noise in Remotery as well as cut down any overhead from spawning so many threads every iteration.

For the record, I've spent a good amount of time brainstorming and trial and error-ing the best solution for this case. I've found that the main problem that lies within the usage of a thread pool + condition variables is that that specific application ends up not applying to this situation. I could go down this route; however, there would be no speedup and would behave the same as if one thread is working through all of the data in series. Since each thread must take a std::unique_lock with a provided mutex to wait on a condition variable, only one thread may be within the execution zone at a time. When it finishes, it unlocks the lock guard so that the next thread can go through and process some data. This would certainly be useful if State(...) had operations to do in the meantime, but since we are specifically blocking, waiting for all of the AddEntityToMessage calls to finish, we would simply be waiting for all the threads to do their work one after the one, equaling a single process approach.

Our desirables for this optimization are:

  1. A thread pool all waiting to do essentially the same job, must be triggered all at once and run concurrently
    i. Note the arguments (the chunk of the data which they are responsible for) to the functor that each thread executes may change on every iteration
  2. The process that triggers the thread pool must block until all of the threads are done
  3. The process that triggered the thread pool must then piece together the data that the threads just composed.

There are a lot of approaches that could be taken here, so I'll list the basics that I wish to see and go from there

  1. Being able to trigger all threads instantaneously
  2. Multiple threads all running their chunk of AddEntityToMessage simultaneously
  3. A recalculation of each thread's responsible data only if the this->dataPtr->entityComponents vector has changed (the std::next operation is iterating x amount of times under the hood, doing this every iteration is work that can be avoided if the entity component vector has not changed since last iteration)
  4. Blocking until all threads are done with their AddEntityToMessage work

Based on the above desirables, I don't think we will necessarily be able to have a system which reuses worker threads. Condition variables would be the way we could do this, but as stated above, they would run in series. A cpp std::thread, by definition, is a thread that is designated to do some operation(s) and then finish. So I think it may be a necessity in this case that we continue to spawn threads every iteration. I know it spams Remotery, but I'm not sure there's another option unless there's one I'm not thinking of. I've looked some into std::async with std::futures but the application there is more intended for getting the return value of the function that the std::async is running, in our case, a reference parameter is modified and there is no return value.

I'll implement this approach... which in the end is basically the same approach as the current prototype implementation but with a few more optimizations, and we can go from there.

@iche033
Copy link
Contributor

iche033 commented Nov 19, 2020

Thanks for looking into it. I recently ran into this bit of code in ign-gazebo that reminded me of WorkerPool in ign-common. Not sure if that'll help with the issue you're running into. Either way, I think going forward with original approach with spawning threads also sounds good to me.

@codecov
Copy link

codecov bot commented Nov 19, 2020

Codecov Report

Merging #451 (7343281) into ign-gazebo3 (4124dea) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff               @@
##           ign-gazebo3     #451      +/-   ##
===============================================
+ Coverage        77.62%   77.66%   +0.03%     
===============================================
  Files              208      208              
  Lines            11527    11574      +47     
===============================================
+ Hits              8948     8989      +41     
- Misses            2579     2585       +6     
Impacted Files Coverage Δ
src/EntityComponentManager.cc 83.46% <100.00%> (+1.62%) ⬆️
src/SimulationRunner.cc 93.40% <0.00%> (-1.13%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4124dea...7343281. Read the comment docs.

@JShep1 JShep1 added the enhancement New feature or request label Nov 20, 2020
Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested by adding and deleting objects in a shapes.sdf, and spawning objects from fuel and it seems to be work fine for the case of single thread.

Interestingly, I tried shapes_population.sdf and noticed that a sphere is missing (sphere_534):

missing_sphere

I also tried generating a shapes_population.sdf world with n set to 100 and I see a couple of models missing. Do you see the same issue?

@JShep1
Copy link
Author

JShep1 commented Dec 17, 2020

I also tried generating a shapes_population.sdf world with n set to 100 and I see a couple of models missing. Do you see the same issue?

Good catch, took me a little bit to find the bug, but I resolved it here: 0ba4684

@JShep1 JShep1 requested review from iche033 and chapulina December 17, 2020 04:45
Copy link
Contributor

@iche033 iche033 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of objects in shapes_populations.sdf are now correct for me.

I'll leave it to you and @chapulina to decide on the number of threads to use. Otherwise the PR looks good.

@chapulina chapulina added the performance Runtime performance label Jan 5, 2021
@JShep1 JShep1 requested a review from chapulina January 8, 2021 01:45
chapulina added a commit that referenced this pull request Jan 14, 2021
Signed-off-by: Louise Poubel <[email protected]>
@chapulina chapulina mentioned this pull request Jan 14, 2021
Copy link
Contributor

@chapulina chapulina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements!

@chapulina
Copy link
Contributor

The ABI checker is having the same old false positive. All other CI looks good. Merging!

@chapulina chapulina merged commit ddd7fcd into ign-gazebo3 Jan 15, 2021
@chapulina chapulina deleted the jshep1/parallelize_ecm branch January 15, 2021 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏰 citadel Ignition Citadel enhancement New feature or request performance Runtime performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants