-
Notifications
You must be signed in to change notification settings - Fork 302
Parallelize State call in ECM #451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
…b.com/ignitionrobotics/ign-gazebo into jshep1/scenebroadcaster_optimizations Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: John Shepherd <[email protected]>
Here's what I got comparing performance of the This is from running the shapes_population.sdf world. It does look there it shaved a few ms off the Note careful about using remotery for profiling. I noticed that since new threads are spawned every iteration, it made remotery a bit laggy and unresponsive sometimes. |
Good to see that you're seeing improvements... I guess Remotery was a bit wonky for me given all the threads spawning. I'll clean up the logic and make it more generalizable and get the PR in for review soon. |
by the way, I wonder if it's possible to reuse threads, e.g. store in member variable?, and have them wait on a condition_variable until there's work to do. |
That's also what I was thinking and was going to attempt something like that. It would certainly reduce noise in Remotery as well as cut down any overhead from spawning so many threads every iteration. |
For the record, I've spent a good amount of time brainstorming and trial and error-ing the best solution for this case. I've found that the main problem that lies within the usage of a thread pool + condition variables is that that specific application ends up not applying to this situation. I could go down this route; however, there would be no speedup and would behave the same as if one thread is working through all of the data in series. Since each thread must take a Our desirables for this optimization are:
There are a lot of approaches that could be taken here, so I'll list the basics that I wish to see and go from there
Based on the above desirables, I don't think we will necessarily be able to have a system which reuses worker threads. Condition variables would be the way we could do this, but as stated above, they would run in series. A cpp I'll implement this approach... which in the end is basically the same approach as the current prototype implementation but with a few more optimizations, and we can go from there. |
Thanks for looking into it. I recently ran into this bit of code in ign-gazebo that reminded me of WorkerPool in ign-common. Not sure if that'll help with the issue you're running into. Either way, I think going forward with original approach with spawning threads also sounds good to me. |
Codecov Report
@@ Coverage Diff @@
## ign-gazebo3 #451 +/- ##
===============================================
+ Coverage 77.62% 77.66% +0.03%
===============================================
Files 208 208
Lines 11527 11574 +47
===============================================
+ Hits 8948 8989 +41
- Misses 2579 2585 +6
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested by adding and deleting objects in a shapes.sdf, and spawning objects from fuel and it seems to be work fine for the case of single thread.
Interestingly, I tried shapes_population.sdf and noticed that a sphere is missing (sphere_534):
I also tried generating a shapes_population.sdf world with n
set to 100
and I see a couple of models missing. Do you see the same issue?
Signed-off-by: John Shepherd <[email protected]>
Good catch, took me a little bit to find the bug, but I resolved it here: 0ba4684 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of objects in shapes_populations.sdf are now correct for me.
I'll leave it to you and @chapulina to decide on the number of threads to use. Otherwise the PR looks good.
Signed-off-by: John Shepherd <[email protected]>
Signed-off-by: Louise Poubel <[email protected]>
Signed-off-by: Louise Poubel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the improvements!
The ABI checker is having the same old false positive. All other CI looks good. Merging! |
Ready for review. Below is a plot of 25 randomly sampled
PostUpdate
profiled points for both before and after this PR.The before average timestep of
PostUpdate
was 33.095 ms and after average time forPostUpdate
is 22.479 ms in the case of theshapes_population.sdf
. This PR is primarily targeted for larger example and will not significantly affect smaller examples (if there are under 50 entity components, no speedup will occur at all due to the nature of the thread spawning in this PR).My RTF before this PR after 2 minutes of running
shapes_population.sdf
was 0.602% and after the PR I am getting about 0.669% RTF.I'm going to experiment with the the number of threads and the number of components each thread handles to see if I can tune these two parameters to get a more optimized RTF.I don't think changing the max thread count to anything higher thanstd::thread::hardware_concurrency()
(the max suggested threads that the hardware supports) would speed up anything, this would likely just bottleneck any additional threads and schedule them in when it puts other threads to sleep. Perhaps we could see some speedup with the amount of entity components each thread will run at minimum, but I'm not so sure this would make a huge difference either, it would only affect scenes in which there are betweencomponentsPerThread
andcomponentsPerThread
*maxThreads
entity components.