-
Notifications
You must be signed in to change notification settings - Fork 30
fix(pubsub): Event queue becoming too long #886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a8ddf3d
to
2fb2ba0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good to me but some tests in tests/event/test_event_manager.py
seem to break
It ran much better and the node got further with less memory use, it got to about 1.9M blocks, with ~10GB of memory use, it didn't die, but it's stuck trying to download some transactions, this is what the log looks like:
|
This is also another relevant exception that happened, not sure if it's related to the changes in this PR though:
I think it happened because I forced a p2p reconnect with a USR1 signal. But it shouldn't cause any exceptions. |
5691806
to
8f7c55b
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #886 +/- ##
==========================================
- Coverage 85.42% 85.29% -0.13%
==========================================
Files 282 283 +1
Lines 22269 22343 +74
Branches 3365 3382 +17
==========================================
+ Hits 19023 19058 +35
- Misses 2580 2605 +25
- Partials 666 680 +14 ☔ View full report in Codecov by Sentry. |
996132a
to
8f7c55b
Compare
c119d99
to
068201b
Compare
2d7d58f
to
519d92c
Compare
fc97e7b
to
c4bd15a
Compare
Motivation
In the sync-v1 process, there's a bottleneck occurring due to the differing speeds of data processing and data arrival. Firstly, the
sync_v1_agent.send_data_queue.priority_queue
is unable to process transactions as quickly as they arrive. This discrepancy results in the queue continuously expanding throughout the synchronization process, keeping references toBaseTransaction
objects.In addition, each set of blocks received by sync-v1 triggers approximately 200 events in the pubsub system. However, similar to the first issue, the event queue processes these transactions at a slower rate than their arrival rate. Consequently, this queue also grows continually during synchronization, maintaining references to
BaseTransaction
objects.Both of these issues contribute to a critical memory management problem. Since these queues hold onto references to vertices, the garbage collector is unable to free up the memory space that these objects occupy.
For context, when operating a full node on my local computer (a MacBook), it initially consumed 3.19 GB of memory while processing 450,000 vertices. After this fix, the memory usage significantly decreased to just 0.62 GB. I also checked the number of blocks in memory every 25,000 blocks and this number was always between 800 and 2,000.
Useful commands:
Acceptance Criteria
Checklist
master
, confirm this code is production-ready and can be included in future releases as soon as it gets merged