Skip to content

Fix 6444 #6470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 27, 2020
Merged

Fix 6444 #6470

merged 5 commits into from
Aug 27, 2020

Conversation

palango
Copy link
Contributor

@palango palango commented Aug 24, 2020

Description

Fixes: #6444

When receiving events from the blockchain, those events are transformed into state changes which are processed by the state machine.

In the past we batched the transformation and the processing in order to speed up execution. As we found out in #6444, this doesn't work in general, as the transformation from blockchain event to state changes might require prior state to be available in the state machine.

An example for this is the ChannelClosed event, which creates different state changes depending on whether or not the channel is found in the state.

This bug is solved by interleaving creation and processing of the state changes.

This is the minimal change necessary to fix this bug, further improvements should be subsequently. I'll link issues here.

Follow-ups

Copy link
Contributor

@karlb karlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test + minimal fix is great! I'm just worried about a potentially big performance impact.

@@ -1126,12 +1130,12 @@ def _best_effort_synchronize_with_confirmed_head(
# of reorgs, and older blocks are not sufficient to fix
# problems with pruning, the `SyncTimeout` is used to ensure
# the `current_confirmed_head` stays valid.
state_changes.extend(
self.handle_and_track_state_changes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this degrade the performance a lot? Does this mean that the whole state is written to the DB for every single state change?

Copy link
Contributor Author

@palango palango Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any benchmark to quantify this? I guess during initial sync it has an impact, but during normal runs it shouldn't hurt much.

Does this mean that the whole state is written to the DB for every single state change?

AFAIU it's mostly more deep-copies, but the whole state shouldn't be written on every call of handle_and_track_state_changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any benchmark to quantify this?

I would try the stress test script or the bf6_stress_hub_node.yaml scenario.

AFAIU it's mostly more deep-copies, but the whole state shouldn't be written on every call of handle_and_track_state_changes.

I was assuming that saved_state was actually saved to disk, but apparently it isn't. So it's not that bad.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested syncing on goerli with this branch and develop.

Before:

num_state_changes=2 time_per_sc=0.011809587478637695
num_state_changes=1 time_per_sc=0.003674030303955078
num_state_changes=1 time_per_sc=0.004012107849121094
num_state_changes=437 time_per_sc=0.00027565661502375484
num_state_changes=425 time_per_sc=0.0003503737730138442
num_state_changes=1625 time_per_sc=0.00023894940889798678
num_state_changes=2468 time_per_sc=0.00025791702626010393
num_state_changes=3903 time_per_sc=0.000250094858707601
num_state_changes=4982 time_per_sc=0.000257042314766019
num_state_changes=2229 time_per_sc=0.00025390305825015685
num_state_changes=132 time_per_sc=0.00040177323601462626
num_state_changes=55 time_per_sc=0.0006727825511585583

After

num_state_changes=2 time_per_sc=0.013517498970031738
num_state_changes=1 time_per_sc=0.005203962326049805
num_state_changes=1 time_per_sc=0.005265951156616211
num_state_changes=437 time_per_sc=0.0021587880306985887
num_state_changes=425 time_per_sc=0.0030157061184153838
num_state_changes=1625 time_per_sc=0.004215660241933969
num_state_changes=2468 time_per_sc=0.007394305794126983
num_state_changes=3903 time_per_sc=0.013630374590311483
num_state_changes=4982 time_per_sc=0.021643087357605666
num_state_changes=2229 time_per_sc=0.026860458105334578
num_state_changes=132 time_per_sc=0.028242356849439217
num_state_changes=55 time_per_sc=0.02827532941644842

So it became roughly 20 to 100 times slower. I guess we can recover some of these losses, but not as we have to interleave the creation and processing of state changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the state copying is the main reason for the slow down (might make sense to verify). Since we have to run them sequentially anyway, would it be safe to skip the state copying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the slow down due to latencies that have been hidden by parallel execution before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have to run them sequentially anyway, would it be safe to skip the state copying?

Yes, this is one optimization that we should do. This matches well with #6474

Or is the slow down due to latencies that have been hidden by parallel execution before?

Which latencies do you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's see if #6474 helps. Feel free to merge this in the mean time if it helps.

Which latencies do you mean?

We're still outside the state machine, so something could do RPC requests that add up. It's hard to tell without looking deeper into the code.

Fixes #6473

The `pendingtokenregistration` dictionary for
`blockchainevent_to_statechange´ is not necessary any more because it
fixed the same bug that caused #6444.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Settled channels are shown in list channels API
2 participants