-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Debugging in FDB
Lukas Joswiak edited this page Aug 19, 2020
·
6 revisions
Here are some helpful debugging tips when you encounter with simulation failures.
-
Why I cannot reproduce failures reported by Joshua on my local machine?
- Simulation tests are only reproducible on the same OS type, and with the same build environment. A binary from a CMake build in docker may not execute the same as a CMake build in the centos7 VM. Libstdc++ vs libc++ makes determinism change.
- Make sure your test is deterministic by checking the unseed numbers are same across multiple runnings. (The seed initializes the random number generator for the test, and then the unseed is the last random number generated.)
- Verify the
trace*.xml
is well-formatted. If you know your test is successful on your local machine, check the lasttrace*.xml
file using command likemono ./correctness/bin/TestHarness.exe summarize trace.0.0.0.0.0.1595012319.MJbTZt.0.1.xml summary.xml "" "" false
. See whether summary.xml has something the same as your Joshua report. (Maybe open the XML file in the browser directly also help.)
-
How to use
valgrind
to help diagnose memory problems?- Edit
CMakeCache.txt
. SetUSE_VALGRIND=ON
then recompile fdbserver. Thenvalgrind --log-file=valg.log ./bin/fdbserver -r simulation -s 366751840 -f ../tests/fast/SwizzledRollbackSideband.toml
can work. - If illegal instruction error appears when you run
valgrind
(saysvex amd64->IR: unhandled instruction bytes...
), setUSE_AVX=OFF
. - You can also submit
packages/valgrind-7.0.0.tar.gz
to Joshua. It takes more time to finish one assemble.
- Edit
-
How to choose a subset of tests to run on Joshua?
- Edit
TEST_INCLUDE:STRING, TEST_EXCLUDE:STRING
. Notice that the regex doesn't have.toml
suffix, and the path can be related to${FDB_REPO_ROOT}/tests
(ex.fast/.*
), or not (ex.Swizzled.*
means all test files under subfolders started with Swizzled). - Only test spec files under
${FDB_REPO_ROOT}/tests/{rare,restarting,slow,fast}
will be choosen. The files under ./tests/ are some tests that used to work or be useful but not anymore.
- Edit
-
How to get useful information in
trace*.xml
, considering there're too many lines?- Start grep Trace events from
fdbserver/tester.actor.cpp
and locate you failed in which test phase. - Function
waitForQuietDatabase
inQuietDatabase.actor.app
also contains many useful trace message in simulation test.
- Start grep Trace events from
-
Get you to know which line causes the problem.
- use coredump files. (ex.
gdb ./bin/fdbserver core.17426
) - use addr2line. The problematic trace stack is logged in trace files and stdout. (ex.
addr2line -e ./bin/fdbserver -p -C -f -i 0x7fae93006630 0x26b5f98 0x26eff9b 0x26dc132 0x26df565 0x26dfaea 0x25c8798 0x25c4f45 0x25c20d8 0x25c1e0e 0x25c0118 0x25b816c 0x25bf888 0x25bf4ba 0x25bc968 0x25bc27a 0x1eb6148 0xf34538 0xf34316 0x287a431 0x287a1f1 0xfc6c78 0x28edc37 0x28eda96 0x28ed731 0x28edae2 0x28ee16c 0xfc6c78 0x29d1016 0x29c5bfe 0x28df844 0x13f3c18 0x7fae92745555
- use coredump files. (ex.
-
A restart test failed. How do I reproduce it locally?
- Restart tests, located in
tests/restarting
, test FDBs ability to correctly function after upgrades. Reproducing them manually requires two versions of FDB on your machine. - Create a new build directory named with the version of FDB you will be building. For example, if you are failing a restarting test in the
from_6.2.0
folder, create a new build directory namedbuild6.2
in your home directory according to the build instructions. - Checkout the 6.2 release in your main FDB repo with
git checkout -b 6.2 origin/release-6.2
. This creates a new local branch6.2
for the 6.2 release. -
Build the 6.2 binary in your
build6.2
folder with CMake and Ninja. - Change directory back to your current build directory. Use the old
fdbserver
binary to run the restarting test that failed. - Now use your up to date
fdbserver
binary to run the restarting test that failed, and append--restarting
to the command. - Example:
$ ../build6.2/bin/fdbserver -r simulation -f ../foundationdb/tests/restarting/from_6.2.0/SnapTestSimpleRestart-1.txt -s 523887594 -b on
$ ./bin/fdbserver -r simulation -f ../foundationdb/tests/restarting/from_6.2.0/SnapTestSimpleRestart-1.txt -s 523887594 -b on --restarting
- Restart tests, located in