-
Notifications
You must be signed in to change notification settings - Fork 538
feat: eliminate unnecessary memory allocations #2892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2892 +/- ##
==========================================
- Coverage 80.90% 80.90% -0.01%
==========================================
Files 872 872
Lines 121039 121066 +27
==========================================
+ Hits 97932 97953 +21
- Misses 23107 23113 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
10ebf1b
to
e24eb4c
Compare
After hours of testing to figure out how much memory has been saved from this PR, I find that it is inconclusive, I don't believe that the This doesn't make any sense to me unless the real issue with the model not fitting is actually the states or something similar. @HerrMuellerluedenscheid, I would appreciate it greatly if you could try out this fork on the esp32. Does anyone else have any ideas what could be the actual issue? Or something else that I'm missing? |
I quickly gave it a try running the squeezenet example using the changes from your main branch on an esp32. It did not work out of the box as the api changed and the example has to be modified. As soon as I have a little time I will dive deeper into that. But I think this is a very promissing approach 👍 |
f2b5fd5
to
0bbde7b
Compare
After the last force push of yours @BjornTheProgrammer I tried again running the squeezenet on esp32. It compiles but panics on memory allocation of stunning 700MB rooted apparently in
I |
700MBs?! Something is really wrong then. I'll have to debug it alongside you then. What is the allocations looking like on v0.16? |
To be honest, I think that this is probably not an issue from this PR after triple checking my changes, I can't see anything that would possibly allocate more memory other than the struct to pass back the args. Even then it would only be a couple of bytes at a time. Maybe it has something to do with how burn uses the loaded model. I should mention that in my testing the model I was using was 4.3kb, it started to work without panicking after allowing around 40kb to be allocated on heap. That's a 1:10 ratio. It was the same for both my changes and also v0.16 |
I'm also pretty sure the problem is rooted elsewhere. I will try to evaluate the memory footprint with a smaller example asap. |
Just a suggestion—the changes can be easily simplified and reduced to just a few lines:
For more details, you can refer to: ivila/burn@main...ivila:burn:feat_memory_zc You can ignore it if you think it is unnecessary. |
Thank you for the suggestions! I'll be implementing this and some other changes to this PR soon! |
I think that this is ready for a merge. I'm not super happy with how this doesn't reduce memory usage much, but I think some other approaches in different PRs would be more useful to solve those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy with how this doesn't reduce memory usage much
In your case, the difference is essentially going from the bytes record to a vec when .to_vec()
was called? Might not be a huge difference if the record is small. But still a nice little improvement.
I have a couple of minor comments 🙂
Also, the CI failure should be fixed if you merge main into your branch!
05daaaf
to
897e528
Compare
That's true, but the larger issue is that on embedded devices, it doesn't really matter the RAM usage fluctuations, as much as the peak RAM used does. I think my next PR will create some tooling to inspect memory usage more in-depth to find areas of optimization. |
I don't think it's related to this PR, but some tests are failing when running
|
Ahh yess the remainder issue. IIRC this is broken at the driver-level for some time. Quantization test failure is probably due to precision or rounding issue. But totally unrelated to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing the changes 🙂
Cleaned up some minor left-overs
Pull Request Template
Checklist
run-checks all
script has been executed.Related Issues/PRs
This PR builds off of @HerrMuellerluedenscheid's fantastic work in pull request #2881, although it has a slightly different approach.
#2871
Changes
This PR it so that args are passed back to Recorder when there is an error, so that there is no need to clone the data unnecessarily. Additionally, this PR will add the ability to have different LoadArgs for at least the BinBytesRecorder.
Testing
Compile all of the examples, and add in some additional tests for other LoadArgs like
&'static [u8]
.