Skip to content

bug(tests): investigate need for delays during node setup #2264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danisharora099 opened this issue Feb 17, 2025 · 6 comments
Open

bug(tests): investigate need for delays during node setup #2264

danisharora099 opened this issue Feb 17, 2025 · 6 comments
Labels
debt Technical debt marker test Issue related to the test suite with no expected consequence to production code

Comments

@danisharora099
Copy link
Collaborator

danisharora099 commented Feb 17, 2025

Description

Currently in packages/tests/src/utils/nodes.ts, we're using arbitrary delays (await delay(2000)) to ensure proper node setup and connection. These delays were added as a temporary fix to make tests pass, but they are not ideal as:

  1. They make tests slower than necessary
  2. They don't guarantee that the system is actually ready (could be too short in some cases)
  3. They might be unnecessarily long in other cases
  4. They make the tests less deterministic and reliable

Instead of using arbitrary delays, we should investigate implementing proper wait mechanisms based on specific log lines or node states.

Expected Behavior

The node setup and connection process should:

  1. Use deterministic waiting mechanisms based on actual node states or specific log lines
  2. Only wait as long as necessary for each operation
  3. Fail fast with clear error messages when expected states aren't reached
  4. Be reliable across different environments and network conditions

Steps to Reproduce

Current problematic code locations:

  1. In packages/tests/src/utils/nodes.ts:

    // First delay after waku node creation
    await delay(2000);
    
    for (const node of serviceNodes.nodes) {
      await waku.dial(await node.getMultiaddrWithId());
      await waku.waitForPeers([Protocols.Filter, Protocols.LightPush]);
      const success = await node.ensureSubscriptions(...);
      
      // Second delay before waiting for log
      await delay(2000);
      
      await node.waitForLog(waku.libp2p.peerId.toString(), 100);
    }
  2. Run tests with npm test in the packages/tests directory

  3. Tests pass but take longer than necessary due to fixed delays

Environment Details

  • js-waku packages version: latest
  • nwaku version: 0.34.0

Investigation Points

  1. After waku node creation:

    • What specific state/log lines indicate the node is fully ready?
    • Can we use waitForLog with specific startup completion indicators?
  2. Before waiting for peer logs:

    • What conditions need to be met before checking for peer logs?
    • Are there specific log lines or node states we can wait for?
    • Why is the current 100ms timeout for waitForLog insufficient without the delay?
  3. Potential solutions to investigate:

    • Extend waitForLog to support multiple log lines in sequence
    • Add new wait mechanisms based on node state rather than logs
    • Implement proper retry mechanisms with backoff instead of fixed delays
    • Add more detailed logging to help identify exact ready states
@chair28980 chair28980 added this to Waku Feb 17, 2025
@danisharora099 danisharora099 moved this to Triage in Waku Feb 17, 2025
@weboko weboko added test Issue related to the test suite with no expected consequence to production code debt Technical debt marker labels Feb 19, 2025
@weboko
Copy link
Collaborator

weboko commented Feb 19, 2025

Adding await for a log line from nwaku would be a good start.

@weboko weboko moved this from Triage to To Do in Waku Feb 19, 2025
@weboko
Copy link
Collaborator

weboko commented Mar 4, 2025

Attack plan:

  • categorize needed awaitFor* methods
  • create functions for them
  • replace await timeout with these methods

Main methods we foresee being created are:

  • waitForAmountOfPeers(n) -> waits for some number of peers nwaku would be connected to;
  • waitForPeer(PeerID?) -> waits for a peer with ID if provided otherwise just any peer
  • ???

cc @danisharora099

@weboko
Copy link
Collaborator

weboko commented Mar 4, 2025

Assigning to @danisharora099 as we discussed at PM

@danisharora099 danisharora099 moved this from To Do to In Progress in Waku Mar 7, 2025
@weboko weboko moved this from In Progress to To Do in Waku Mar 10, 2025
@weboko
Copy link
Collaborator

weboko commented Mar 10, 2025

Moved back to TODO to prioritize RLN/presentation this week.

@weboko weboko moved this from To Do to Triage in Waku Mar 24, 2025
@weboko
Copy link
Collaborator

weboko commented Apr 10, 2025

I think delay is not our main problematic point but docker allocation.

It takes ~6s to allocate nodes:
Image

Upon closer look, majority of this time goes to docker:
Image

So, on each it we spend 6s at least, and though it might be less for those that do not use runWithRedundancy it is still majority of our test cases.

I think next steps that we should do are following:

  • implement docker container pool from which nodes will be taken if needed AND reused;
  • identify test cases that need independent fresh nodes and spawn them per case with documented exception;

cc @waku-org/js-waku

@weboko
Copy link
Collaborator

weboko commented Apr 11, 2025

In todo for now due to previously mentioned numbers that delay is not biggest problem.

@weboko weboko moved this from Triage to To Do in Waku Apr 11, 2025
@danisharora099 danisharora099 removed their assignment May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debt Technical debt marker test Issue related to the test suite with no expected consequence to production code
Projects
Status: To Do
Development

No branches or pull requests

2 participants