Skip to content

AMOCAS recommended compatibility mappings are too weak for the c11 memory model #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
patrick-rivos opened this issue Jul 10, 2024 · 13 comments · Fixed by #445
Closed

Comments

@patrick-rivos
Copy link
Contributor

patrick-rivos commented Jul 10, 2024

Andrea Parri spotted an issue with the amocas operation with the current seq_cst amo<op> mappings.

The issue stems from this sentence in the ratified zacas extension:

The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

https://github.com/riscvarchive/riscv-zacas/releases/tag/v1.0

The issue can be demonstrated using this litmus test:

C c-cmpxchg-fail
{ [z] = 1; }

P0(atomic_int *x, atomic_int *y, int *z)
{
        atomic_store_explicit(x, 1, memory_order_seq_cst);
        int r0 = atomic_compare_exchange_strong_explicit(y, z, 2, memory_order_seq_cst, memory_order_seq_cst);
}

P1(atomic_int *x, atomic_int *y)
{
        atomic_store_explicit(y, 1, memory_order_seq_cst);
        int r1 = atomic_load_explicit(x, memory_order_seq_cst);
}

exists (0:r0=0 /\ 1:r1=0)

The atomic_compare_exchange_strong_explicit fails (r0 == 0) meaning it read that y != z (value of 1) aka y = 0.
r1 == 0 ensures that P1 also reads 0 aka x = 0.

herd7 results:

Test c-cmpxchg-fail Allowed
States 3
0:r0=0; 1:r1=1;
0:r0=1; 1:r1=0;
0:r0=1; 1:r1=1;
No

As shown by herd7, the (0:r0=0; 1:r1=0;) behavior is impossible with the c11 memory model.

This is a variant of another common memory-model testcase

This testcase is essentially a variant of:

seq_cst store a, 1   | seq_cst store b, 1
seq_cst load  0x7, b | seq_cst load 0x8, a
exists (0x7 = 0 /\ 0x8 = 0)

We can approximate this mapping into RISC-V asm using the memory model mappings.
Since the mappings are compatible with A.6, we'll use an A.6 seq_cst store.
Following that will be amocas.aqrl, however since it fails it follows the text in the spec:

The memory operation performed by an AMOCAS.W/D/Q, when not successful, has acquire semantics if aq bit is 1 but does not have release semantics, regardless of rl.

We'll model it using lw.aq - a hypothetical insn that is present in herd7.

RISCV rv-cmpxchg-fail

{
0:x1=x; 0:x2=1; 0:x3=y;
1:x1=y; 1:x2=1; 1:x3=x;
}

  P0             |  P1           ;
  fence rw,w     |  sw x2,0(x1)  ;
  sw x2,0(x1)    |  fence rw,rw  ;
  lw.aq x4,0(x3) |  lw x4,0(x3)  ;

exists (0:x4=0 /\ 1:x4=0)

P1's fences are minimized since it's acting as an observer thread. We know the lw/sw can't be reordered with a full fence between them so the interesting behavior is in the P0 hart.

herd7 results:

Test rv-cmpxchg-fail Allowed
States 4
0:x4=0; 1:x4=0;
0:x4=0; 1:x4=1;
0:x4=1; 1:x4=0;
0:x4=1; 1:x4=1;
Ok

This shows the issue: 0:x4=0; 1:x4=0;. This is the same case that causes us to require lr.aqrl for atomic_(memory_order_seq_cst).

Thanks again to Andrea Parri for spotting this and making the litmus tests.

To resolve this for atomic_compare_exchange ops that have the failure argument set to memory_order_seq_cst we may need a full fence before those amocas insns in order to maintain the ordering guarantees in case the cas fails.

Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

cc'ing a few atomics related people, please add anyone I missed:
@hboehm @kito-cheng @aparri @daniellustig @preames @ilovepi @asb

@daniellustig
Copy link

Yes good catch. I think the key here is the failure argument, i.e., the last argument of atomic_compare_exchange_strong_explicit. It's memory_order_seq_cst in the C++ example, but not sequentially consistent in the AMOCAS instruction mapping due to the highlighted sentence saying .rl is ignored on failure. Is that right?

The zacas fast track extension was added recently, after the RVWMO baseline was written, and after the ISA manual appendix was written, and I think after our previous discussions about atomics ABI and C++ mappings. So this litmus test seems to have slipped through.

To resolve this we may need a full fence before each amocas insn in order to maintain the ordering guarantees in case the cas fails.

Specifically when the failure argument is memory_order_seq_cst?

Alternatively (and I know this won't be popular...), one could argue to treat this as a defect in the zacas spec and update it to say rl is respected even in case of failure...

@patrick-rivos
Copy link
Contributor Author

I think the key here is the failure argument, i.e., the last argument of atomic_compare_exchange_strong_explicit. It's memory_order_seq_cst in the C++ example, but not sequentially consistent in the AMOCAS instruction mapping due to the highlighted sentence saying .rl is ignored on failure. Is that right?

Yes that's my understanding as well.

Specifically when the failure argument is memory_order_seq_cst?

Yes, thanks. I've edited the original comment to clarify that statement.

@patrick-rivos
Copy link
Contributor Author

Here's a pointer to the original discussion about a failed AMOCAS' semantics:
riscvarchive/riscv-zacas#20

In C++ a compare_exchange that fails is treated as a load for memory model issues. "If the operation returns true, these operations are atomic read-modify-write operations (6.9.2) on the memory pointed to by this. Otherwise, these operations are atomic load operations on that memory." C++ load operations don't provide release ordering (which wouldn't make sense in that model). Thus there is no requirement that ordinary prior data operations be ordered with respect to even a seq_cst compare_exchange that fails.

I think this particular case is different since it's interacting with a seq_cst store. Hopefully @hboehm can provide more insight here.

@asb
Copy link
Collaborator

asb commented Jul 10, 2024

Thankfully LLVM 18 does not contain zacas but trunk uses this mapping and is not marked as experimental. GCC does not have zacas support yet.

Thanks for tagging me. In terms of timelines, the current plan sees LLVM 19 being branched on 23rd July, with a series of release candidates taking place until 3rd September. Getting a change in sooner is better than later, but of course getting the right fix in is more important. As any such change would be quite limited in scope to the RISC-V backend, we have some flexibility in landing it after the initial branch and backporting it to LLVM 19 if necessary. With that in mind, I'd say there's maybe 1 month to decide what to do here for the LLVM 19 release (which as a backup option, could be to mark Zacas as experimental again because the proper mappings haven't been definitively decided).

@hboehm
Copy link
Contributor

hboehm commented Jul 11, 2024 via email

@patrick-rivos
Copy link
Contributor Author

The issue is that the current PSABI mappings claim to be compatible with A.6 (or at least call out when the mappings aren't compatible). A note-3 disclaimer does not exist for amocas if someone reads the amo<op> suggested mappings. The original intention of A6S was to be compatible with both A6C and A7 and this behavior/mapping breaks that.

One solution would be to add a fenced amocas mapping with an alternative non-fenced version with a note-3 warning so that the meaning of A6S is preserved.

@hboehm
Copy link
Contributor

hboehm commented Jul 12, 2024 via email

patrick-rivos added a commit to patrick-rivos/riscv-elf-psabi-doc that referenced this issue Jul 15, 2024
The Zacas extension defines different ordering behavior when an amocas fails:
> The memory operation performed by an AMOCAS.W/D/Q, when not successful, has
> acquire semantics if aq bit is 1 but does not have release semantics,
> regardless of rl.

This requires a leading fence to maintain A6C compatability for both the RVWMO
and Ztso memory models. The A7 mappings can use the non-fenced versions.

See issue riscv-non-isa#444 for more context and litmus tests.

Resolves riscv-non-isa#444.
@patrick-rivos
Copy link
Contributor Author

I've made the first draft update here: #445

I'm not following what you mean by the TSO table needing a change. I think this line in the Ztso spec means we're OK within the PSABI mappings (as long as it applies to amocas):

All AMOs behave as if they have both acquire-RCsc and release-RCsc annotations.

@ved-rivos
Copy link

Since a atomic_compare_exchange performs a read-modify-write only when it succeeds and is only a load when it fails, should definition for a seq_cst atomic_compare_exchange that fails only needs acquire operation? Does C++ require release operation on the load performed by the failing atomic_compare_exchange?

A load operation with this memory order performs an acquire operation, a store performs a release operation, and read-modify-write performs both an acquire operation and a release operation.

@anparri
Copy link

anparri commented Jul 16, 2024

Thanks for the write-up and for sending these changes, @patrick-rivos .

@ved-rivos , a seq_cst load is (strictly) stronger than an acquire load as the example above can also illustrate; of course, there is no such thing as a "release load" in the C memory model, but that should provide some relevant intuition indeed.

@hboehm
Copy link
Contributor

hboehm commented Jul 17, 2024 via email

@patrick-rivos
Copy link
Contributor Author

I think the discussion has concluded so I've marked #445 as ready to land. Please correct me if I'm wrong.

@hboehm
Copy link
Contributor

hboehm commented Jul 18, 2024 via email

asb added a commit to asb/llvm-project that referenced this issue Jul 22, 2024
As discussed at the last sync-up call, mark Zacas as experimental until
this ABI issue is resolved
<riscv-non-isa/riscv-elf-psabi-doc#444>.
asb added a commit to llvm/llvm-project that referenced this issue Jul 23, 2024
…99898)

As discussed at the last sync-up call, mark Zacas as experimental until
this ABI issue is resolved
<riscv-non-isa/riscv-elf-psabi-doc#444>.

Don't return Zacas in getHostCPUFeatures (leaving a TODO there) as even if requesting detection of "native" features, the user likely doesn't want to automatically opt in to experimental codegen.
yuxuanchen1997 pushed a commit to llvm/llvm-project that referenced this issue Jul 25, 2024
…99898)

Summary:
As discussed at the last sync-up call, mark Zacas as experimental until
this ABI issue is resolved
<riscv-non-isa/riscv-elf-psabi-doc#444>.

Don't return Zacas in getHostCPUFeatures (leaving a TODO there) as even if requesting detection of "native" features, the user likely doesn't want to automatically opt in to experimental codegen.

Test Plan: 

Reviewers: 

Subscribers: 

Tasks: 

Tags: 


Differential Revision: https://phabricator.intern.facebook.com/D60251180
asb added a commit to llvm/llvm-project that referenced this issue Sep 25, 2024
The extension has been ratified for some time, but we kept it
experimental (see #99898) due to
<riscv-non-isa/riscv-elf-psabi-doc#444>. The
ABI issue has been resolved by #101023 so I believe there's no known
barrier to moving Zacas to non-experimental.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants