Fix 6444 #6470

palango · 2020-08-24T13:13:34Z

Description

When receiving events from the blockchain, those events are transformed into state changes which are processed by the state machine.

In the past we batched the transformation and the processing in order to speed up execution. As we found out in #6444, this doesn't work in general, as the transformation from blockchain event to state changes might require prior state to be available in the state machine.

An example for this is the ChannelClosed event, which creates different state changes depending on whether or not the channel is found in the state.

This bug is solved by interleaving creation and processing of the state changes.

This is the minimal change necessary to fix this bug, further improvements should be subsequently. I'll link issues here.

Follow-ups

karlb

Test + minimal fix is great! I'm just worried about a potentially big performance impact.

raiden/tests/integration/test_raidenservice.py

karlb · 2020-08-25T11:47:35Z

raiden/raiden_service.py

@@ -1126,12 +1130,12 @@ def _best_effort_synchronize_with_confirmed_head(
                # of reorgs, and older blocks are not sufficient to fix
                # problems with pruning, the `SyncTimeout` is used to ensure
                # the `current_confirmed_head` stays valid.
-                state_changes.extend(
+                self.handle_and_track_state_changes(


Does this degrade the performance a lot? Does this mean that the whole state is written to the DB for every single state change?

Do we have any benchmark to quantify this? I guess during initial sync it has an impact, but during normal runs it shouldn't hurt much.

Does this mean that the whole state is written to the DB for every single state change?

AFAIU it's mostly more deep-copies, but the whole state shouldn't be written on every call of handle_and_track_state_changes.

Do we have any benchmark to quantify this?

I would try the stress test script or the bf6_stress_hub_node.yaml scenario.

AFAIU it's mostly more deep-copies, but the whole state shouldn't be written on every call of handle_and_track_state_changes.

I was assuming that saved_state was actually saved to disk, but apparently it isn't. So it's not that bad.

I tested syncing on goerli with this branch and develop.

Before:

num_state_changes=2 time_per_sc=0.011809587478637695 num_state_changes=1 time_per_sc=0.003674030303955078 num_state_changes=1 time_per_sc=0.004012107849121094 num_state_changes=437 time_per_sc=0.00027565661502375484 num_state_changes=425 time_per_sc=0.0003503737730138442 num_state_changes=1625 time_per_sc=0.00023894940889798678 num_state_changes=2468 time_per_sc=0.00025791702626010393 num_state_changes=3903 time_per_sc=0.000250094858707601 num_state_changes=4982 time_per_sc=0.000257042314766019 num_state_changes=2229 time_per_sc=0.00025390305825015685 num_state_changes=132 time_per_sc=0.00040177323601462626 num_state_changes=55 time_per_sc=0.0006727825511585583

After

num_state_changes=2 time_per_sc=0.013517498970031738 num_state_changes=1 time_per_sc=0.005203962326049805 num_state_changes=1 time_per_sc=0.005265951156616211 num_state_changes=437 time_per_sc=0.0021587880306985887 num_state_changes=425 time_per_sc=0.0030157061184153838 num_state_changes=1625 time_per_sc=0.004215660241933969 num_state_changes=2468 time_per_sc=0.007394305794126983 num_state_changes=3903 time_per_sc=0.013630374590311483 num_state_changes=4982 time_per_sc=0.021643087357605666 num_state_changes=2229 time_per_sc=0.026860458105334578 num_state_changes=132 time_per_sc=0.028242356849439217 num_state_changes=55 time_per_sc=0.02827532941644842

So it became roughly 20 to 100 times slower. I guess we can recover some of these losses, but not as we have to interleave the creation and processing of state changes.

I assume the state copying is the main reason for the slow down (might make sense to verify). Since we have to run them sequentially anyway, would it be safe to skip the state copying?

Or is the slow down due to latencies that have been hidden by parallel execution before?

Since we have to run them sequentially anyway, would it be safe to skip the state copying?

Yes, this is one optimization that we should do. This matches well with #6474

Or is the slow down due to latencies that have been hidden by parallel execution before?

Which latencies do you mean?

Then let's see if #6474 helps. Feel free to merge this in the mean time if it helps.

Which latencies do you mean?

We're still outside the state machine, so something could do RPC requests that add up. It's hard to tell without looking deeper into the code.

See #6444

Fixes #6473 The `pendingtokenregistration` dictionary for `blockchainevent_to_statechange´ is not necessary any more because it fixed the same bug that caused #6444.

palango marked this pull request as ready for review August 25, 2020 11:26

palango requested a review from karlb August 25, 2020 11:26

This was referenced Aug 25, 2020

Remove pendingtokenregistration from blockchainevent_to_statechange #6473

Closed

Process batch of state changes in a single DB transaction #6474

Closed

karlb reviewed Aug 25, 2020

View reviewed changes

palango added 5 commits August 25, 2020 14:12

Add regression test for #6444

8ba44d2

Create and process blockchain state changes interleaved

4c0a42c

See #6444

Change signature of blockchainevent_to_statechange

43c99b5

Update changelog for #6444

c0f3fb1

Remove pendingtokenregistration

719e319

Fixes #6473 The `pendingtokenregistration` dictionary for `blockchainevent_to_statechange´ is not necessary any more because it fixed the same bug that caused #6444.

karlb approved these changes Aug 25, 2020

View reviewed changes

palango mentioned this pull request Aug 26, 2020

Add AtomicStateChangeDispatcher and use it for blockchain events #6484

Merged

palango merged commit 5c2c876 into raiden-network:develop Aug 27, 2020

palango deleted the fix-6444 branch August 27, 2020 14:49

palango mentioned this pull request Aug 27, 2020

[WIP] Processing decode and dispatch per event not in batches #6449

Closed

5 tasks

karlb mentioned this pull request Sep 18, 2020

Release blockchain event handling bug fix on mainnnet #6549

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix 6444 #6470

Fix 6444 #6470

Uh oh!

palango commented Aug 24, 2020 •

edited

Loading

Uh oh!

karlb left a comment

Uh oh!

Uh oh!

Uh oh!

karlb Aug 25, 2020

Uh oh!

palango Aug 25, 2020 •

edited

Loading

Uh oh!

karlb Aug 25, 2020

Uh oh!

palango Aug 25, 2020

Uh oh!

karlb Aug 25, 2020

Uh oh!

karlb Aug 25, 2020

Uh oh!

palango Aug 25, 2020

Uh oh!

karlb Aug 25, 2020

Uh oh!

Uh oh!

Fix 6444 #6470

Fix 6444 #6470

Uh oh!

Conversation

palango commented Aug 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Follow-ups

Uh oh!

karlb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

karlb Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

palango Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karlb Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

palango Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

karlb Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

karlb Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

palango Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

karlb Aug 25, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

palango commented Aug 24, 2020 •

edited

Loading

palango Aug 25, 2020 •

edited

Loading