-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[ARM] Possible race condition in std::atomic_compare_exchange_strong #29793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Renato Our header simply forwards to clang's __c11_atomic_foo builtins. The bug exhibited here is almost certainly in the builtins, not libc++. I'm rebinning this under Clang. Feel free to move it back if you disagree. |
I'll take a look at this. |
@Renato: Would you mind testing this small reproducer for me please? ---------->8----------- void testme() int main() If it asserts (I expect it to), could you please send me over the binaries (image + object) as before? The original test is templated and produces a lot of output, this should make it easy for me to look at the codegen. Also, what options are you using with this run? Your linked bot seems to be down. I presume the options are same as those on http://lab.llvm.org:8011/builders/libcxx-libcxxabi-libunwind-arm-linux? I can still work on the original binaries somewhat, but the above reproducer should make it easier. |
Hi Asiri, Sorry for the delay, I was at Connect. The reproducer does assert: atomic_compare_exchange_strong.pass.cpp:16: void testme(): Assertion `a == (long long)(2)' failed. Program received signal SIGABRT, Aborted. attaching the source/objects. |
Asiri, Just FYI, I fixed our bot, which is back exhibiting this bug: http://buildmaster.tcwglab.linaro.org/builders/libcxx-libcxxabi-libunwind-arm-linux/builds/101 So we can monitor again if commits fix it before moving the bot to the upstream master. cheers, |
Hi Rento, How urgent is this btw? I'm a bit stuck with some work. I can ask someone else to have a peak if this is quite urgent. Cheers, / Asiri |
Hi Asiri, We're just waiting on this bug to move the new libcxx builder to production and de-phase the old Chromebook-based ones. We still have other bugs to look at [1] and since you know libc++ better than we do, I'll let you fix this one. :) The libc++ bot is not high traffic, so it should be fine keeping it downstream, since it's already tracking trunk and is publicly accessible (but not emailing anyone). cheers, [1] http://buildmaster.tcwglab.linaro.org/builders/clang-cmake-armv7-a15/builds/693 |
And, just like that, it stopped happening... :/ If it happens again I'll re-open this bug. |
Sorry, I didn't find any time to look into this :( Glad it went away on its own :P |
We're seeing this on Android too. Definitely not an issue with libc++ because you can reduce the test case down to just the atomic intrinsics and it still fails: #include int main() {
} Pirama just tried with a current Clang and said he's still seeing it. Not sure why Renato's failure disappeared. Some other notes:
|
Hi Dan, I'm using Xenial's default Clang: 3.8.0. We have no way of fixing GCC 5.3, at least not in time to backport it to Ubuntu's updates, which will be in their own time. We have found a number of problems with that version of GCC, so using Clang was the only solution. I do have a list of all the problems we found, if you're interested. But regarding Clang, which version are you using? 3.9.0? Some trunk? If Clang still fails and our 3.8 passes by luck, I very much want to know about it! :) cheers, |
Looks like we're currently on r275480. I think Pirama did his test with ToT. |
Oh, the other important repro piece: it only shows up at -O0. That's probably why Renato isn't seeing it any more. |
This is not good. I'm reopening this. Could this be the result of Clang optimising (and eliding the problem) with the code? Or undefined behaviour? I don't know enough about the new standards (or atomics) to identify undefined behaviour in there. cheers, |
I don't have access to a board / model, I'm looking at the assembly generated for the following program at -O3: int foo() { __c11_atomic_store(&a, 2LL, __ATOMIC_SEQ_CST); __c11_atomic_compare_exchange_strong(&a, return __c11_atomic_load(&a, __ATOMIC_SEQ_CST); I can't see a problem. Will try trunk. |
(Right, missed the comment, will check -O0 too) |
I can't see an obvious error at -O0 for clang 3.8.0, but the assembly generated is quite messy, so I could be missing something. I will load this onto a model on Monday and look at the trace to make sure. @Renato: could you check if my code snippet, that is: int foo() { __c11_atomic_store(&a, 2LL, __ATOMIC_SEQ_CST); __c11_atomic_compare_exchange_strong(&a, return __c11_atomic_load(&a, __ATOMIC_SEQ_CST); Produces the wrong result (i.e. returns anything but 2) for you on 3.8.0 at -O0? Just to double check if my simplification has not removed the codegen fault. Thanks. |
Trunk codegen has changed a fair bit; both r11 and sp are used as base registers for accessing the stack, which makes tracing the assembly output quite hard (at -O0). I will compare the traces on Monday, that should make it easy to pick out the problem. / Asiri |
Attempt reproduce @Dan, Renato: Can one of you please try the attached reproducer (t1.c) and confirm that you can still see the problem (i.e. prints 3 instead of 2)? If so, could you please attach the disassembly that is failing? Just to make absolutely sure we're on the same page. Thanks. / Asiri |
Hi Asiri, Clang 3.8.0 works well (ie, prints 2) with your example. I tried multiple changes to the command line (O0 ~ O3, -g, -std, etc) and they all work. --renato |
I tested Asiri's reproducer (t1.c) on ToT clang (with a slight change of directly returning the value from main instead of printing to stdout). I can confirm that it works correctly on ToT clang (and returns 2). I am attaching the objdump of binaries produced by ToT clang and the one used by Android (which is based on r275480). I couldn't explain why the output from r275480 is broken, but hopefully Asiri or Renato can figure it out. |
So the problem with r275480 is here: 480: e1b18f9f ldrexd r8, [r1] That cmp/sbcs combination does not work for comparing two 64-bit values. In this case, r8=2, r4=1 and r9=r5=0. So the cmp sets C=1 (not borrow) and sbcs performs ip = r9 - r5 - not(C), which evaluates to zero, so the branch is not taken and you get the strexd. On ToT, this is done like so: 494: e1b38f9f ldrexd r8, [r3] Which is more straightforward to understand (and correct!). I could run the bisector and see which commit fixed the issue, but given it is fixed, perhaps we should all move on? ;-) |
r288433 with a pretty on-the-mark commit message. Found by searching 'O0' in 'git log', since I was pretty sure this wasn't fixed in my checkout from late November. |
Pirama, Have you tried reverting it to see if the behaviour goes away? |
@Renato: r288433 is the commit that fixes the problem... I think this issue can now be closed? It would be pretty safe to back-port r288433 if android is stuck on an earlier revision. |
Sorry, I thought you said that was the one that broke it. :) It is safe to close the bug, thanks for the investigations. |
mentioned in issue llvm/llvm-bugzilla-archive#30675 |
Extended Description
The tests for atomic_compare_exchange_strong may be exhibiting a race condition.
Source:
http://buildmaster.tcwglab.linaro.org/builders/libcxx-libcxxabi-libunwind-arm-linux
Environment:
ARMv7 Cortex-A15
Ubuntu Xenial 16.04.2
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2)
GNU ld (GNU Binutils for Ubuntu) 2.26.1
ldd (Ubuntu GLIBC 2.23-0ubuntu3) 2.23
The same tests pass on our production bot:
ARMv7 Cortex-A15
Ubuntu Trusty 14.04
gcc version 4.8.2 (Ubuntu/Linaro 4.8.2-19ubuntu1)
GNU ld (GNU Binutils for Ubuntu) 2.24
ldd (Ubuntu EGLIBC 2.19-0ubuntu6.3) 2.19
I'm building in the exact same configuration (our master is a mirror). I have tested it on two different boards (Chromebooks and Jetson TK1s) with Xenial and the problem is consistent on both with Xenial and neither with Trusty.
When I run the tests directly, I get an assert like this:
atomic_compare_exchange_strong.pass.cpp:41: void TestFn::operator()() const [T = long long]: Assertion `a == T(2)' failed.
The code is:
The first exchange does the right thing, compare, they're equal, copies T(2) into
a
, then the asserts are correct.The second exchange, however, produce
false
(as expected), but nowa
has the value of 3, which is completely unexpected. It shouldn't be writing ontoa
at all!In GDB, when I run get a stack trace, it prints
__a_ = 3
, but when I then set a breakpoint on that line, it keeps repeating (as if looping over, may be the test harness), and always producing the correct, expected results.For completeness, here's the stack trace:
Program received signal SIGABRT, Aborted.
__libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
47 ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S: No such file or directory.
(gdb) bt
#0 __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:47
#1 0xb6d76638 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#2 0xb6d7733a in __GI_abort () at abort.c:89
#3 0xb6d719b2 in __assert_fail_base (fmt=0x1 <error: Cannot access memory at address 0x1>, assertion=0x2a0d8 "a == T(2)", assertion@entry=0x2 <error: Cannot access memory at address 0x2>, file=file@entry=0xb6ff3340 "", line=41,
line@entry=3068375168, function=function@entry=0x2a30f "void TestFn::operator()() const [T = long long]") at assert.c:92
#4 0xb6d71a4a in __GI___assert_fail (assertion=0x2 <error: Cannot access memory at address 0x2>, file=0xb6ff3340 "", line=3068375168, function=0x2a30f "void TestFn::operator()() const [T = long long]") at assert.c:101
#5 0x000239f0 in TestFn::operator() (this=0xbefff1fc) at /home/linaro/devel/llvm/worktree/repos/libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_compare_exchange_strong.pass.cpp:41
#6 0x00010700 in TestEachIntegralType::operator() (this=0xbefff240) at /home/linaro/devel/llvm/worktree/repos/libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_helpers.h:39
#7 0x0001067c in TestEachAtomicType::operator() (this=0xbefff254) at /home/linaro/devel/llvm/worktree/repos/libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_helpers.h:60
#8 0x00010654 in main () at /home/linaro/devel/llvm/worktree/repos/libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_compare_exchange_strong.pass.cpp:61
(gdb) f 5
#5 0x000239f0 in TestFn::operator() (this=0xbefff1fc) at /home/linaro/devel/llvm/worktree/repos/libcxx/test/std/atomics/atomics.types.operations/atomics.types.operations.req/atomic_compare_exchange_strong.pass.cpp:41
41 assert(a == T(2));
(gdb) p a
$1 = {<std::__1::__atomic_base<long long, true>> = {<std::__1::__atomic_base<long long, false>> = {_a = 3, static is_always_lock_free = }, }, }
The text was updated successfully, but these errors were encountered: