feat: Run monitoring follower with acceptance testing #10816

usmanmani1122 · 2025-01-08T09:29:11Z

closes: #XXXX
refs: #XXXX

Description

This PR runs a loadgen follower in the accpetance proposal pre test hook and monitor the chain while the acceptance proposal is executing
The generated artifacts are also uploaded to Github

Security Considerations

None

Scaling Considerations

None

Documentation Considerations

None

Testing Considerations

None

Upgrade Considerations

None

…tance-pre-test"

cloudflare-workers-and-pages · 2025-01-08T09:29:26Z

Deploying agoric-sdk with Cloudflare Pages

Latest commit:	`b3be5a1`
Status:	✅ Deploy successful!
Preview URL:	https://01583980.agoric-sdk.pages.dev
Branch Preview URL:	https://usman-acceptance-pre-test.agoric-sdk.pages.dev

View logs

mhofman

This looks great. Besides a couple requested optimization, I'd also like for @michaelfig to take a look as he's more familiar with shell scripting than I am, and has some historical interest in the integration test.

.github/workflows/integration.yml

mhofman · 2025-03-04T03:16:41Z

.github/workflows/integration.yml

+      - id: build-cosmic-swingset
+        name: Build cosmic-swingset dependencies
+        run: |
+          set -o errexit
+
+          yarn install
+          make --directory packages/cosmic-swingset all
+        working-directory: agoric-sdk


Let's move this before the verify SDK image didn't change step, as a build of the cosmos side of the SDK should not cause the docker image to rebuild either, and if it does it's a bug in the dockerignore file / Dockerfile

a3p-integration/proposals/z:acceptance/host/before-test-run.sh

mhofman · 2025-03-04T03:51:28Z

a3p-integration/proposals/z:acceptance/test.sh

+if ! test -z "$MESSAGE_FILE_PATH"; then
+  echo "Waiting for 'ready' message from follower"
+  # make sure the follower has not crashed
+  node "$DIRECTORY_PATH/wait-for-follower.mjs" '^(ready)|(exit code \d+)$' | grep --extended-regexp --silent "^ready$"
+  echo "Follower is ready"
+fi


Given that we're spending a significant amount of time waiting for the follower to be ready, I am wondering if we shouldn't move this test lower in the suite. There isn't really a drawback as the follower will still execute all blocks the same, just some of them as catchup instead.

Currently it take ~250 empty blocks for the follower to be ready, and about 300 blocks after the follower is ready is the beginning of "ACCEPTANCE TESTING wallet", so I'd move the ready test at least there or later.

Thanks. I can confirm that it now runs in parallel and the test suite didn't block on the follower being ready. It looks like it didn't reduce the overall test time, but that might be due to the runner being slow in general (which would likely be alleviated by #11003)

a3p-integration/proposals/z:acceptance/test.sh

mhofman

Awesome, thanks for working through this.

One final nit, but feel free to merge after that. Would still love to get @michaelfig's eyes on it, but we can always revisit.

a3p-integration/proposals/z:acceptance/host/before-test-run.sh

mhofman · 2025-03-04T23:56:11Z

a3p-integration/proposals/z:acceptance/test.sh

+if ! test -z "$MESSAGE_FILE_PATH"; then
+  echo "Waiting for 'ready' message from follower"
+  # make sure the follower has not crashed
+  node "$DIRECTORY_PATH/wait-for-follower.mjs" '^(ready)|(exit code \d+)$' | grep --extended-regexp --silent "^ready$"
+  echo "Follower is ready"
+fi


Thanks. I can confirm that it now runs in parallel and the test suite didn't block on the follower being ready. It looks like it didn't reduce the overall test time, but that might be due to the runner being slow in general (which would likely be alleviated by #11003)

mhofman · 2025-03-05T00:00:22Z

.github/workflows/integration.yml

+          yarn install
+          make --directory packages/cosmic-swingset all


I didn't pay close attention to the commands here, where do these come from?
I'd expect the following:

Suggested change

yarn install

make --directory packages/cosmic-swingset all

cd packages/cosmic-swingset

make install

In particular there should be no need to redo yarn install as that was done as part of the restore-node step. Given caches these are mostly equivalent, just surprised to see something different than deployment-test

michaelfig

Consider cribbing from an existing wait_for_rpc check instead of implementing it from scratch.

michaelfig · 2025-03-05T17:09:22Z

a3p-integration/proposals/z:acceptance/host/before-test-run.sh

+    curl "$rpc_address" --max-time "5" --silent > /dev/null 2>&1
+    status_code="$?"
+
+    echo "rpc '$rpc_address' responded with '$status_code'"
+
+    if ! test "$status_code" -eq "0"; then
+      sleep 5
+    else
+      break
+    fi


This wait_for_rpc will return success even if the node has not caught up with the rest of the chain. Instead, take inspiration from instagoric's readinessProbe

json=$(curl "$rpc_address/status" --max-time 5 --silent 2>/dev/null | jq .result.sync_info.catching_up) echo "rpc '$rpc_address' responded with 'catching_up=$json'" test "$json" != false || break sleep 5

This wait_for_rpc will return success even if the node has not caught up with the rest of the chain.

Well this is a single node chain, and we're using the validator as RPC in this case, so it is sufficient.

Maybe I put that incorrectly. There is a situation where when the chain is booting for the first time, the RPC port becomes available, but it will fail queries until the first block is committed. I don't know if the rest of this feature is tolerant of that situation.

Maybe this is a case where we can rely on already having committed blocks available in state (from the history of the a3p-integration upgrades). I'm wary because I've struggled with flakes in other tests where the "wait-for-chain" function didn't actually wait for both RPC to start listening as well as genesis to be committed.

I'm just concerned that this function might get cargo-culted without a caveat that it is brittle.

Gotcha. Yeah we know for a fact the chain has already bootstrapped in this case. It might be good to add a disclaimer about this assumption.

usmanmani1122 and others added 23 commits December 26, 2024 22:10

pre test hook for acceptance proposal

5e0ac5f

changes

d78e4c6

Merge branch "usman/a3p-prepare-test-script" into branch "usman/accep…

400c64f

…tance-pre-test"

Merge branch 'master' into usman/acceptance-pre-test

7455a3a

use custom cli

abb8181

volume instead of mount

47ffb80

Empty

c04c8dd

temporary log

9571d66

Merge branch 'master' into usman/acceptance-pre-test

c0156af

Empty

4fcc7bd

oopsie

3bf08ef

script tracing

797ec7b

oopsie

38b3e6a

supress script sourcing logs

98eb90d

omfg

1df7749

upload artifacts

45c7e3a

timeout on step

ea008d0

oopsie daisy

a8106ca

should be it

fe055f0

test

106afdf

reduce timeout

b211f2a

disable deployment-test

88dd181

remove temp changes

355ac46

usmanmani1122 self-assigned this Jan 8, 2025

Merge branch 'master' into usman/acceptance-pre-test

0054c0b

Merge branch 'master' into usman/acceptance-pre-test

b26112b

usmanmani1122 added the force:integration Force integration tests to run on PR label Jan 22, 2025

usmanmani1122 added 2 commits January 23, 2025 11:08

Merge branch 'master' into usman/acceptance-pre-test

250c420

update scripts name

15a91b6

usmanmani1122 and others added 13 commits March 1, 2025 10:12

fix: path extraction

142f23d

fix: remove setup to workflow

d4d2db9

fix: directory path

748b939

fix: more paths

3aa695b

fix: restore node path

cb63506

fix

716ecec

testing

e12b01f

revert

8c78d8b

Merge branch 'master' into usman/acceptance-pre-test

d8898e9

nit

f9ee4fd

fix: install golang

0239cd9

fix: use default home

903418c

fix: wait for rpc

3d59508

usmanmani1122 requested a review from mhofman March 3, 2025 08:56

mhofman reviewed Mar 4, 2025

View reviewed changes

mhofman requested a review from michaelfig March 4, 2025 04:01

usmanmani1122 added 2 commits March 4, 2025 11:13

Merge branch 'master' into usman/acceptance-pre-test

c249a0e

address comments

866996e

usmanmani1122 requested a review from mhofman March 4, 2025 11:37

mhofman approved these changes Mar 5, 2025

View reviewed changes

usmanmani1122 added 4 commits March 5, 2025 05:23

Merge branch 'master' into usman/acceptance-pre-test

1990c6c

fix: make all -> make install

359a1d0

add timeout to test phase to get logs

5cd4461

fix: always run artifacts upload

b3be5a1

usmanmani1122 added the automerge:squash Automatically squash merge label Mar 5, 2025

mergify bot merged commit baa7c08 into master Mar 5, 2025
94 checks passed

mergify bot deleted the usman/acceptance-pre-test branch March 5, 2025 10:01

michaelfig reviewed Mar 5, 2025

View reviewed changes

This was referenced May 8, 2025

Switch loadgen CI to replicate agoric-sdk a3p based testing. Agoric/testnet-load-generator#125

Open

ci: fix notify for integration test-docker-build #11368

Open

feat: Run monitoring follower with acceptance testing #10816

feat: Run monitoring follower with acceptance testing #10816

Uh oh!

Conversation

usmanmani1122 commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

Uh oh!

cloudflare-workers-and-pages bot commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agoric-sdk with Cloudflare Pages

Uh oh!

mhofman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mhofman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

michaelfig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhofman Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

usmanmani1122 commented Jan 8, 2025 •

edited

Loading

cloudflare-workers-and-pages bot commented Jan 8, 2025 •

edited

Loading

mhofman Mar 7, 2025 •

edited

Loading