Mixed-size accesses #840

xeren · 2025-03-27T11:22:29Z

This PR adds basic support for mixed-size and misaligned accesses, i.e. memory instructions with imperfectly overlapping access ranges. Such an instruction is partitioned into multiple events with distinct access ranges. The events then address properly slotted byte ranges. The generated events are si-related (see below).

Memory model

The new base relation si ("Same Instruction") is an equivalence relation. Each equivalent class denotes an instruction.
cat/aarch64-mixed.cat: Application-level ARM with mixed-size support. Modified copy fetched from aarch64.cat.

Verification

Static analysis of where a memory access should be split in order to properly alias other accesses.
Program processor Tearing performs the splitting of events according to the above analysis results. This process is sensisitive to the byte ordering of the program. For litmus tests, little endian is assumed. In LLVM IR, the byte order is explicit.

Configuration

Option program.analysis.mixedSize: Set to false to disable the feature, if you are confident, that MSAs do not happen. This is recommended, if you are selecting another field-insensitive alias analysis (via program.analysis.alias) or verifying a program with unsupported pointer operations (currently non-linear pointer arithmetic, like bit fiddling).

Tests

Copied from aarch64-tests. The verdicts were generated using DIY 7.58 with the following command:

herd7 -variant mixed -model cat/aarch64-mixed.cat -I cat $LITMUS_TEST

AliasAnalysis.getMixedSizeAccessSet(AliasAnalysis,List)

Add AliasAnalysis.mayMixedSizeAccesses(MemoryCoreEvent) Remove AliasAnalysis.mayMix(MemoryCoreEvent,MemoryCoreEvent) Remove AliasAnalysis.getMayMixedSizeAccessesSet(Collection)

…ts use archType instead of byteType

…t) for unknown events

ExpressionFactory.makeIntExtract(Expression,int,int)

Signed-off-by: Hernan Ponce de Leon <[email protected]>

hernanponcedeleon

First pass up to the litmus and assertion parsers

hernanponcedeleon · 2025-04-18T08:10:00Z

benchmarks/mixed/lockref1.c

+// ==========================================
+//                   Lockref
+// ==========================================


I don't see the point of keeping both variants (specially since according to Thomas both generated the same dat3m IR). This just make it more confusing for people trying to navigate over the benchmarks in the repo with no clear benefit.

I added another method for the API, refactor the code and created some more interesting examples. I would simply remove lockref1 and lockrref2.

The two versions are very different in terms of LLVM IR. The fact that we create (almost) the same internal IR is not obvious which makes it a valuable benchmark to evaluate our processing pipeline.
I don't think we need the two versions as unit tests, but I would like to keep them around for manual testing, i.e., inspection of logging outputs. We could have a seperate directory for such benchmarks.

dartagnan/src/main/antlr4/LitmusAArch64.g4

dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java

dartagnan/src/main/java/com/dat3m/dartagnan/expression/integers/IntExtract.java

dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/visitors/VisitorLitmusAArch64.java

benchmarks/mixed/lockref-par1.c

benchmarks/mixed/lockref-par2.c

benchmarks/mixed/lockref-seq.c

Signed-off-by: Hernan Ponce de Leon <[email protected]>

…cesses

…ixed-sized-accesses

# Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/Intrinsics.java

hernanponcedeleon · 2025-05-08T12:57:44Z

dartagnan/src/main/java/com/dat3m/dartagnan/configuration/OptionNames.java

@@ -50,6 +50,7 @@ public class OptionNames {
    public static final String PROPAGATE_COPY_ASSIGNMENTS = "program.processing.propagateCopyAssignments";
    public static final String REMOVE_ASSERTION_OF_TYPE = "program.processing.skipAssertionsOfType";
    public static final String NONTERMINATION_INSTRUMENTATION = "program.processing.nonTermination";
+    public static final String MIXED_SIZE = "program.processing.mixedSize";


This option only makes sense when we have modeling.integers=false, right? If so, can we somehow check that the two optiosn match accordingly?

At the moment, we do no clever encoding in presence of MSAs and encoding.integers=true. The responsible classes are ProcessingManager and EncodingContext, which currently do not directly communicate. We could issue a warning at the first (or k-th) bitwise operation. Otherwise we could add a second declaration of the option, or move one of the options into Program.

Semantically, the option makes sense independent of whether we do integer encoding or not. The only problem with the combination is that there are going to be more BV operations which will cause slowdown. But the same is true if the program naturally contains such operations.
That being said, I don't think I have implemented extract/concat operations for the integer encoding, so the encoder will likely throw an exception.

I also saw a TODO about some missing módulo operations to restrict values (I think in some simplification step) for the case of an integer type. This sounds to me that we can even be unsound if those options are combined

The TODO you are referring to is in the encoding of BV truncations (e.g. bv64 down to bv32), where we do not reduce the value. I don't think we necessarily need to, because integer encoding is incompatible with overflowing arithmetic anyway. So integer encoding in itself is unsound to begin with.

hernanponcedeleon · 2025-05-08T13:01:54Z

dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/utils/ProgramBuilder.java

-                final Expression zero = expressions.makeZero(types.getArchType());
-                for (int offset = 0; offset < size; offset++) {
-                    mem.setInitialValue(offset, zero);
-                }
+                mem.setInitialValue(0, expressions.makeZero(types.getIntegerType(8 * size)));
            }


I don't understand why we need this change? Isn't this the responsible of all the fails in the CI that Thomas mentioned were related to init events?

No, I think this change is because before we have never considered the correct type for initial memory.
A 1-byte sized memory object could hold a 8-byte sized initial value, which is just not correct, but didn't result in problems before.

This change was necessary. The former snippet produced overlapping initializations, which would always result in worst-case byte-wise Tearing. A compromise would be to produce initial values for every eighth byte, as below.

final Expression zero = expressions.makeZero(types.getArchType()); final int archSize = types.getMemorySizeInBytes(types.getArchType()); for (int offset = 0; offset + archSize <= size; offset += archSize) { mem.setInitialValue(offset, zero); } if (size % archSize != 0) { final IntegerType remType = types.getIntegerType(size % archSize); mem.setInitialValue(archSize * (size / archSize), expressions.makeZero(remType)); }

dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/visitors/VisitorLitmusAArch64.java

hernanponcedeleon · 2025-05-08T13:16:39Z

dartagnan/src/main/java/com/dat3m/dartagnan/program/event/core/InstructionBoundary.java

+
+    private final InstructionBoundary begin;
+
+    public InstructionBoundary(Void ignore, InstructionBoundary b) {


Why do we need this first parameter?

To avoid collision with the copy constructor.

hernanponcedeleon · 2025-05-08T13:20:15Z

dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/ProcessingManager.java

+                detectMixedSizeAccesses ? ProgramProcessor.fromFunctionProcessor(
+                        FunctionProcessor.chain(
+                                performAssignmentInlining ? AssignmentInlining.newInstance() : null,
+                                sccp,
+                                dce,
+                                removeDeadJumps
+                        ), Target.THREADS, true
+                ) : null,


Why do we run all these passes again?

Because Tearing opens up new possibilities to optimize if you had mixed-size accesses on local variables. At least Mem2Reg and SCCP should run again after Tearing. If the set of optimization is identical to what we already had, we can refactor this into a single local variable and reuse it twice.

I think MemToReg gets no benefit from Tearing. Trying to make it MSA-aware would probably let it solve the local instances on its own. But the assignments introduced by Tearing can be simplified.

I'm not following what you are saying. You say that MemToReg is not MSA-aware and therefore probably pessimistic in regards to MSA accesses, no? If so, then why is Tearing not enabling more optimizations?
What happens if you locally allocate an bv64 but use bv32 accesses to it? What will MemToReg do here?

To detect MSAs MemToReg relies on allocationType, which Tearing cannot relyably update without copying the infrastructure of MemToReg. I refered to the other approach, bringing a bit of Tearing into MemToReg instead. This approach is now implemented. It results in it promoting the local accesses on the first try, before Tearing.

dartagnan/src/test/java/com/dat3m/dartagnan/litmus/LitmusAARCH64MixedTest.java

dartagnan/src/test/java/com/dat3m/dartagnan/others/miscellaneous/AnalysisTest.java

…ixed-sized-accesses # Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/CoreCodeVerification.java # dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/Intrinsics.java # dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/ProcessingManager.java # dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/ThreadCreation.java

…ixed-sized-accesses

dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/ProcessingManager.java

…ixed-sized-accesses

dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/MemToReg.java

…ixed-sized-accesses # Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/visitors/VisitorLlvm.java

# Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/encoding/ExpressionEncoder.java # dartagnan/src/main/java/com/dat3m/dartagnan/encoding/WmmEncoder.java # dartagnan/src/main/java/com/dat3m/dartagnan/expression/ExpressionKind.java # dartagnan/src/main/java/com/dat3m/dartagnan/program/Program.java # dartagnan/src/main/java/com/dat3m/dartagnan/verification/model/ExecutionModelManager.java

xeren added 30 commits February 5, 2025 19:14

AliasAnalysis.mayMix(MemoryCoreEvent,MemoryCoreEvent)

3f3c9b6

AliasAnalysis.getMixedSizeAccessSet(AliasAnalysis,List)

Tearing

8ca166c

AnalysisTest.allKindsOfMixedSizeAccesses()

cb4a966

fixup! AliasAnalysis.mayMix(MemoryCoreEvent,MemoryCoreEvent)

bba372c

fixup! Tearing

9e5d5bd

TransactionMarker

b24a306

SameTransaction

d9a8288

teared events inherit tags from their original

22700f1

Track RMW after Tearing

67eae35

Memory.isBigEndian()

343e786

Big Endian Tearing

190dc6b

Missing update of MemoryEvent.getAccessType() in Load and Store

706c464

Remove inter-transaction program order

5867d30

fixup! SameTransaction

6953d44

fixup! Track RMW after Tearing

6dcfaa3

More precise post-processing in InclusionBasedPointerAnalysis

8502499

Remove unused array initializers in litmus tests

8f38963

fixup! Tearing

1538560

Extend litmus grammar

2ab097a

Detailed MSA information

62c2a9d

Add AliasAnalysis.mayMixedSizeAccesses(MemoryCoreEvent) Remove AliasAnalysis.mayMix(MemoryCoreEvent,MemoryCoreEvent) Remove AliasAnalysis.getMayMixedSizeAccessesSet(Collection)

Fix address arithmetics in tests from base i8 to i64, because the tes…

3ff69de

…ts use archType instead of byteType

Fix Mixed-size access detection in InclusionBasedPointerAnalysis

4ecbb32

Fix InclusionBasedPointerAnalysis.mayMixedSizeAccesses(MemoryCoreEven…

74e4964

…t) for unknown events

ExpressionFactory.makeIntConcat(Expression,Expression)

4ae26ab

ExpressionFactory.makeIntExtract(Expression,int,int)

32-bit registers in VisitorLitmusAArch64

9c48301

Fix teared stores always being 8-bit

9b0f1a9

Add support for 8-, 16- and 32-bit operations in AArch64 litmus tests

e08de66

Refactor

44b095e

Program.addInit(MemoryObject,int)

74e3033

Tear initializations

448bfe5

hernan-poncedeleon added 2 commits April 18, 2025 16:41

Better lockref benchmarks

c1a4bd2

Signed-off-by: Hernan Ponce de Leon <[email protected]>

Enable new lockref tests in CI

052a7eb

Signed-off-by: Hernan Ponce de Leon <[email protected]>

hernanponcedeleon reviewed Apr 18, 2025

View reviewed changes

ThomasHaas reviewed Apr 18, 2025

View reviewed changes

benchmarks/mixed/lockref-par1.c Outdated Show resolved Hide resolved

benchmarks/mixed/lockref-par1.c Outdated Show resolved Hide resolved

benchmarks/mixed/lockref-par2.c Outdated Show resolved Hide resolved

benchmarks/mixed/lockref-seq.c Outdated Show resolved Hide resolved

hernan-poncedeleon and others added 8 commits April 18, 2025 17:41

Feedback implemented

e8eabb3

Signed-off-by: Hernan Ponce de Leon <[email protected]>

Arm instructions SWPB, SWPH

121dd3b

Merge remote-tracking branch 'origin/development' into mixed-sized-ac…

e78c244

…cesses

Special visualization of si

7f8fd7e

Remove Tag.Armv8.MO_RX

66fc931

Merge remote-tracking branch 'refs/remotes/origin/development' into m…

1ba3e2e

…ixed-sized-accesses

Merge branch 'refs/heads/development' into mixed-sized-accesses

bbd044e

# Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/Intrinsics.java

Add SameInstruction relation to LazyRelationAnalysis and LazyEncodeSets

e09212d

hernanponcedeleon reviewed May 8, 2025

View reviewed changes

xeren added 4 commits May 8, 2025 21:44

Implement Feedback

bc92d9c

fixup! Implement Feedback

38f1f97

Merge remote-tracking branch 'refs/remotes/origin/development' into m…

eef0fb9

…ixed-sized-accesses

ThomasHaas reviewed May 9, 2025

View reviewed changes

dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/ProcessingManager.java Outdated Show resolved Hide resolved

xeren added 5 commits May 12, 2025 11:09

fixup! Implement Feedback

ed68c9f

mixed-local.c

049a056

MSA-aware MemToReg

8c9b8bb

fixup! MSA-aware MemToReg

7a07ffb

Merge remote-tracking branch 'refs/remotes/origin/development' into m…

a6a68e8

…ixed-sized-accesses

ThomasHaas reviewed May 13, 2025

View reviewed changes

dartagnan/src/main/java/com/dat3m/dartagnan/program/processing/MemToReg.java Outdated Show resolved Hide resolved

xeren and others added 6 commits May 13, 2025 15:41

Better naming of MemToReg registers

2567cb4

Merge remote-tracking branch 'refs/remotes/origin/development' into m…

8e2eb77

…ixed-sized-accesses # Conflicts: # dartagnan/src/main/java/com/dat3m/dartagnan/parsers/program/visitors/VisitorLlvm.java

Tune Non-MSA AArch64 litmus tests

06da103

Fixup after merge

19dcc9b

Add back accidentally removed encoding method.

3d3cf5b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixed-size accesses #840

Mixed-size accesses #840

xeren commented Mar 27, 2025

hernanponcedeleon left a comment

hernanponcedeleon Apr 18, 2025

ThomasHaas Apr 18, 2025

hernanponcedeleon May 8, 2025

xeren May 8, 2025

ThomasHaas May 8, 2025

hernanponcedeleon May 8, 2025

ThomasHaas May 8, 2025 •

edited

Loading

hernanponcedeleon May 8, 2025

ThomasHaas May 8, 2025

xeren May 8, 2025

hernanponcedeleon May 8, 2025

xeren May 8, 2025

hernanponcedeleon May 8, 2025

ThomasHaas May 8, 2025 •

edited

Loading

xeren May 9, 2025

ThomasHaas May 9, 2025

xeren May 12, 2025


		private final InstructionBoundary begin;

		public InstructionBoundary(Void ignore, InstructionBoundary b) {

Mixed-size accesses #840

Are you sure you want to change the base?

Mixed-size accesses #840

Conversation

xeren commented Mar 27, 2025

Memory model

Verification

Configuration

Tests

hernanponcedeleon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasHaas May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasHaas May 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ThomasHaas May 8, 2025 •

edited

Loading

ThomasHaas May 8, 2025 •

edited

Loading