-
Notifications
You must be signed in to change notification settings - Fork 378
Fix 6444 #6470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 6444 #6470
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test + minimal fix is great! I'm just worried about a potentially big performance impact.
raiden/raiden_service.py
Outdated
@@ -1126,12 +1130,12 @@ def _best_effort_synchronize_with_confirmed_head( | |||
# of reorgs, and older blocks are not sufficient to fix | |||
# problems with pruning, the `SyncTimeout` is used to ensure | |||
# the `current_confirmed_head` stays valid. | |||
state_changes.extend( | |||
self.handle_and_track_state_changes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this degrade the performance a lot? Does this mean that the whole state is written to the DB for every single state change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any benchmark to quantify this? I guess during initial sync it has an impact, but during normal runs it shouldn't hurt much.
Does this mean that the whole state is written to the DB for every single state change?
AFAIU it's mostly more deep-copies, but the whole state shouldn't be written on every call of handle_and_track_state_changes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any benchmark to quantify this?
I would try the stress test script or the bf6_stress_hub_node.yaml
scenario.
AFAIU it's mostly more deep-copies, but the whole state shouldn't be written on every call of handle_and_track_state_changes.
I was assuming that saved_state
was actually saved to disk, but apparently it isn't. So it's not that bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested syncing on goerli with this branch and develop.
Before:
num_state_changes=2 time_per_sc=0.011809587478637695
num_state_changes=1 time_per_sc=0.003674030303955078
num_state_changes=1 time_per_sc=0.004012107849121094
num_state_changes=437 time_per_sc=0.00027565661502375484
num_state_changes=425 time_per_sc=0.0003503737730138442
num_state_changes=1625 time_per_sc=0.00023894940889798678
num_state_changes=2468 time_per_sc=0.00025791702626010393
num_state_changes=3903 time_per_sc=0.000250094858707601
num_state_changes=4982 time_per_sc=0.000257042314766019
num_state_changes=2229 time_per_sc=0.00025390305825015685
num_state_changes=132 time_per_sc=0.00040177323601462626
num_state_changes=55 time_per_sc=0.0006727825511585583
After
num_state_changes=2 time_per_sc=0.013517498970031738
num_state_changes=1 time_per_sc=0.005203962326049805
num_state_changes=1 time_per_sc=0.005265951156616211
num_state_changes=437 time_per_sc=0.0021587880306985887
num_state_changes=425 time_per_sc=0.0030157061184153838
num_state_changes=1625 time_per_sc=0.004215660241933969
num_state_changes=2468 time_per_sc=0.007394305794126983
num_state_changes=3903 time_per_sc=0.013630374590311483
num_state_changes=4982 time_per_sc=0.021643087357605666
num_state_changes=2229 time_per_sc=0.026860458105334578
num_state_changes=132 time_per_sc=0.028242356849439217
num_state_changes=55 time_per_sc=0.02827532941644842
So it became roughly 20 to 100 times slower. I guess we can recover some of these losses, but not as we have to interleave the creation and processing of state changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the state copying is the main reason for the slow down (might make sense to verify). Since we have to run them sequentially anyway, would it be safe to skip the state copying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the slow down due to latencies that have been hidden by parallel execution before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have to run them sequentially anyway, would it be safe to skip the state copying?
Yes, this is one optimization that we should do. This matches well with #6474
Or is the slow down due to latencies that have been hidden by parallel execution before?
Which latencies do you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's see if #6474 helps. Feel free to merge this in the mean time if it helps.
Which latencies do you mean?
We're still outside the state machine, so something could do RPC requests that add up. It's hard to tell without looking deeper into the code.
Description
Fixes: #6444
When receiving events from the blockchain, those events are transformed into state changes which are processed by the state machine.
In the past we batched the transformation and the processing in order to speed up execution. As we found out in #6444, this doesn't work in general, as the transformation from blockchain event to state changes might require prior state to be available in the state machine.
An example for this is the
ChannelClosed
event, which creates different state changes depending on whether or not the channel is found in the state.This bug is solved by interleaving creation and processing of the state changes.
This is the minimal change necessary to fix this bug, further improvements should be subsequently. I'll link issues here.
Follow-ups
pendingtokenregistration
fromblockchainevent_to_statechange
#6473