Skip to content

[AArch64,ELF] Allow implicit $d/$x at section beginning #99718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Jul 19, 2024

The start state of a new section is EMS_None, often leading to a
$d/$x at offset 0. Introduce a MCTargetOption/cl::opt
"implicit-mapsyms" to allow an alternative behavior
(ARM-software/abi-aa#274):

  • Set the start state to EMS_Data or EMS_A64.
  • For text sections, add an ending $x only if the final data is not instructions.
  • For non-text sections, add an ending $d only if the final data is not data commands.
.section .text.1,"ax"
nop
// emit $d
.long 42
// emit $x

.section .text.2,"ax"
nop

This new behavior decreases the .symtab size significantly:

% bloaty a64-2/bin/clang -- a64-0/bin/clang
    FILE SIZE        VM SIZE
 --------------  --------------
  -5.4% -1.13Mi  [ = ]       0    .strtab
 -50.9% -4.09Mi  [ = ]       0    .symtab
  -4.0% -5.22Mi  [ = ]       0    TOTAL

This scheme works as long as the user can rule out some error scenarios:

  • .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior
  • A linker script combining non-text sections and text sections. The
    lack of mapping symbols in the non-text sections could make them
    treated as code, unless the linker inserts extra mapping symbols.

The above mix-and-match scenarios aren't an issue at all for a
significant portion of users.

A text section may start with data commands in rare cases (e.g.
-fsanitize=function) that many users don't care about. When combing
(.text.0; .word 0) and (.text.1; .word 0), the ending $x of .text.0
and the initial $d of .text.1 may have the same address. If both
sections reside in the same file, ensure the ending symbol comes before
the initial $d of .text.1, so that a dumb linker respecting the symbol
order will place the ending $x before the initial $d.

Disassemblers using stable sort will see both symbols at the same
address, and the second will win.

When section ordering mechanisms (e.g. --symbol-ordering-file,
--call-graph-profile-sort, .text : { second.o(.text) first.o(.text) })
are involved, the initial data in a text section following a text
section with trailing data could be misidentified as code, but the issue
is local and the risk could be acceptable.

Created using spr 1.3.5-bogner
@llvmbot llvmbot added backend:AArch64 mc Machine (object) code labels Jul 19, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2024

@llvm/pr-subscribers-lld
@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-aarch64

Author: Fangrui Song (MaskRay)

Changes

The start state of a new section is EMS_None, often leading to a
$d/$x at offset 0. Introduce a MCTargetOption/cl::opt
"optimize-mapping-symbols" to allow an alternative behavior
(ARM-software/abi-aa#274):

  • Set the start state to EMS_Data or EMS_A64.
  • For text sections, add an ending $x only if the final data is not instructions.
  • For non-text sections, add an ending $d only if the final data is not data commands.
.section .text1,"ax"
nop
// emit $d
.long 42
// emit $x

.section .text2,"ax"
nop

This new behavior decreases the .symtab size significantly:

% bloaty a64-2/bin/clang -- a64-0/bin/clang
    FILE SIZE        VM SIZE
 --------------  --------------
  -5.4% -1.13Mi  [ = ]       0    .strtab
 -50.9% -4.09Mi  [ = ]       0    .symtab
  -4.0% -5.22Mi  [ = ]       0    TOTAL

This scheme works as long as the user can rule out some error scenarios:

  • .text1 assembled using the traditional behavior is combined with .text2 using the new behavior
  • A linker script combining non-text sections and text sections. The
    lack of mapping symbols in the non-text sections could make them
    treated as code, unless the linker inserts extra mapping symbols.

The above mix-and-match scenarios aren't an issue at all for a
significant portion of users.


Full diff: https://github.com/llvm/llvm-project/pull/99718.diff

5 Files Affected:

  • (modified) llvm/include/llvm/MC/MCTargetOptions.h (+2)
  • (modified) llvm/include/llvm/MC/MCTargetOptionsCommandFlags.h (+2)
  • (modified) llvm/lib/MC/MCTargetOptionsCommandFlags.cpp (+7)
  • (modified) llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp (+33-5)
  • (modified) llvm/test/MC/AArch64/mapping-across-sections.s (+10)
diff --git a/llvm/include/llvm/MC/MCTargetOptions.h b/llvm/include/llvm/MC/MCTargetOptions.h
index 899299fd15246..d16cb38fa0ae2 100644
--- a/llvm/include/llvm/MC/MCTargetOptions.h
+++ b/llvm/include/llvm/MC/MCTargetOptions.h
@@ -64,6 +64,8 @@ class MCTargetOptions {
   // Use CREL relocation format for ELF.
   bool Crel = false;
 
+  bool OptimizeMappingSymbols = false;
+
   // If true, prefer R_X86_64_[REX_]GOTPCRELX to R_X86_64_GOTPCREL on x86-64
   // ELF.
   bool X86RelaxRelocations = true;
diff --git a/llvm/include/llvm/MC/MCTargetOptionsCommandFlags.h b/llvm/include/llvm/MC/MCTargetOptionsCommandFlags.h
index 9d592446f3ba7..f4b4e2cbcb450 100644
--- a/llvm/include/llvm/MC/MCTargetOptionsCommandFlags.h
+++ b/llvm/include/llvm/MC/MCTargetOptionsCommandFlags.h
@@ -53,6 +53,8 @@ bool getSaveTempLabels();
 
 bool getCrel();
 
+bool getOptimizeMappingSymbols();
+
 bool getX86RelaxRelocations();
 
 bool getX86Sse2Avx();
diff --git a/llvm/lib/MC/MCTargetOptionsCommandFlags.cpp b/llvm/lib/MC/MCTargetOptionsCommandFlags.cpp
index 813b1194b47cb..f3dcf499eb9f9 100644
--- a/llvm/lib/MC/MCTargetOptionsCommandFlags.cpp
+++ b/llvm/lib/MC/MCTargetOptionsCommandFlags.cpp
@@ -48,6 +48,7 @@ MCOPT(bool, NoDeprecatedWarn)
 MCOPT(bool, NoTypeCheck)
 MCOPT(bool, SaveTempLabels)
 MCOPT(bool, Crel)
+MCOPT(bool, OptimizeMappingSymbols)
 MCOPT(bool, X86RelaxRelocations)
 MCOPT(bool, X86Sse2Avx)
 MCOPT(std::string, ABIName)
@@ -134,6 +135,11 @@ llvm::mc::RegisterMCTargetOptionsFlags::RegisterMCTargetOptionsFlags() {
                             cl::desc("Use CREL relocation format for ELF"));
   MCBINDOPT(Crel);
 
+  static cl::opt<bool> OptimizeMappingSymbols(
+      "optimize-mapping-symbols",
+      cl::desc("Allow mapping symbol at section beginning to be implicit"));
+  MCBINDOPT(OptimizeMappingSymbols);
+
   static cl::opt<bool> X86RelaxRelocations(
       "x86-relax-relocations",
       cl::desc(
@@ -174,6 +180,7 @@ MCTargetOptions llvm::mc::InitMCTargetOptionsFromFlags() {
   Options.MCNoTypeCheck = getNoTypeCheck();
   Options.MCSaveTempLabels = getSaveTempLabels();
   Options.Crel = getCrel();
+  Options.OptimizeMappingSymbols = getOptimizeMappingSymbols();
   Options.X86RelaxRelocations = getX86RelaxRelocations();
   Options.X86Sse2Avx = getX86Sse2Avx();
   Options.EmitDwarfUnwind = getEmitDwarfUnwind();
diff --git a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp
index bfeca4bd5a92d..e79a17395a2bc 100644
--- a/llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp
+++ b/llvm/lib/Target/AArch64/MCTargetDesc/AArch64ELFStreamer.cpp
@@ -32,8 +32,10 @@
 #include "llvm/MC/MCStreamer.h"
 #include "llvm/MC/MCSubtargetInfo.h"
 #include "llvm/MC/MCSymbolELF.h"
+#include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/MCWinCOFFStreamer.h"
 #include "llvm/Support/Casting.h"
+#include "llvm/Support/CommandLine.h"
 #include "llvm/Support/FormattedStream.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/TargetParser/Triple.h"
@@ -176,19 +178,29 @@ void AArch64TargetAsmStreamer::emitInst(uint32_t Inst) {
 /// by MachO. Beware!
 class AArch64ELFStreamer : public MCELFStreamer {
 public:
+  friend AArch64TargetELFStreamer;
   AArch64ELFStreamer(MCContext &Context, std::unique_ptr<MCAsmBackend> TAB,
                      std::unique_ptr<MCObjectWriter> OW,
                      std::unique_ptr<MCCodeEmitter> Emitter)
       : MCELFStreamer(Context, std::move(TAB), std::move(OW),
                       std::move(Emitter)),
-        MappingSymbolCounter(0), LastEMS(EMS_None) {}
+        MappingSymbolCounter(0), LastEMS(EMS_None) {
+    auto *TO = getContext().getTargetOptions();
+    OptimizeMappingSymbols = TO && TO->OptimizeMappingSymbols;
+  }
 
   void changeSection(MCSection *Section, uint32_t Subsection = 0) override {
-    // We have to keep track of the mapping symbol state of any sections we
-    // use. Each one should start off as EMS_None, which is provided as the
-    // default constructor by DenseMap::lookup.
+    // We have to keep track of the mapping symbol state of any sections we use.
+    // The initial state is EMS_A64 for text sections and EMS_Data for the
+    // others.
     LastMappingSymbols[getCurrentSection().first] = LastEMS;
-    LastEMS = LastMappingSymbols.lookup(Section);
+    auto It = LastMappingSymbols.find(Section);
+    if (It != LastMappingSymbols.end())
+      LastEMS = It->second;
+    else if (OptimizeMappingSymbols)
+      LastEMS = Section->isText() ? EMS_A64 : EMS_Data;
+    else
+      LastEMS = EMS_None;
 
     MCELFStreamer::changeSection(Section, Subsection);
   }
@@ -280,6 +292,7 @@ class AArch64ELFStreamer : public MCELFStreamer {
   }
 
   int64_t MappingSymbolCounter;
+  bool OptimizeMappingSymbols;
 
   DenseMap<const MCSection *, ElfMappingSymbol> LastMappingSymbols;
   ElfMappingSymbol LastEMS;
@@ -304,6 +317,21 @@ void AArch64TargetELFStreamer::finish() {
   AArch64ELFStreamer &S = getStreamer();
   MCContext &Ctx = S.getContext();
   auto &Asm = S.getAssembler();
+
+  // If OptimizeMappingSymbols is specified, ensure that text sections end with
+  // the A64 state while non-text sections end with the data state. When
+  // sections are combined by the linker, the subsequent section will start with
+  // the right state.
+  if (S.OptimizeMappingSymbols) {
+    for (MCSection &Sec : Asm) {
+      S.switchSection(&Sec);
+      if (Sec.isText())
+        S.emitA64MappingSymbol();
+      else
+        S.emitDataMappingSymbol();
+    }
+  }
+
   MCSectionELF *MemtagSec = nullptr;
   for (const MCSymbol &Symbol : Asm.symbols()) {
     const auto &Sym = cast<MCSymbolELF>(Symbol);
diff --git a/llvm/test/MC/AArch64/mapping-across-sections.s b/llvm/test/MC/AArch64/mapping-across-sections.s
index 6bb5a8811b57d..49151326caf6c 100644
--- a/llvm/test/MC/AArch64/mapping-across-sections.s
+++ b/llvm/test/MC/AArch64/mapping-across-sections.s
@@ -1,4 +1,5 @@
 // RUN: llvm-mc -triple=aarch64 -filetype=obj %s | llvm-objdump -t - | FileCheck %s
+// RUN: llvm-mc -triple=aarch64 -filetype=obj -optimize-mapping-symbols %s | llvm-objdump -t - | FileCheck %s --check-prefix=CHECK1
 
 .section .text1,"ax"
 add w0, w0, w0
@@ -38,3 +39,12 @@ add w0, w0, w0
 // CHECK-NEXT: 0000000000000004 l       .rodata        0000000000000000 $x.7
 // CHECK-NEXT: 0000000000000000 l       .comment       0000000000000000 $d.8
 // CHECK-NOT:  {{.}}
+
+// CHECK1:      SYMBOL TABLE:
+// CHECK1-NEXT: 0000000000000004 l       .text  0000000000000000 $d.0
+// CHECK1-NEXT: 0000000000000008 l       .text  0000000000000000 $x.1
+// CHECK1-NEXT: 000000000000000c l       .text  0000000000000000 $d.2
+// CHECK1-NEXT: 0000000000000004 l       .rodata        0000000000000000 $x.3
+// CHECK1-NEXT: 0000000000000010 l       .text  0000000000000000 $x.4
+// CHECK1-NEXT: 0000000000000008 l       .rodata        0000000000000000 $d.5
+// CHECK-NOT:  {{.}}

@MaskRay
Copy link
Member Author

MaskRay commented Jul 19, 2024

The new option "optimize-mapping-symbols" is for llc/llvm-mc/etc. clang -mllvm -optimize-mapping-symbols does not get the option. Another patch will be needed to enable a -Wa, option, similar to #97378 (CREL).

It's desirable to coordinate with the GNU binutils project on the naming of this option.

It's worth noting that the EMS_A64 start state modification for text sections currently deviates from the standard (ARM-software/abi-aa#274), but could become compliant when the ABI is updated.

Created using spr 1.3.5-bogner
MaskRay added 2 commits July 23, 2024 17:53
Created using spr 1.3.5-bogner
Created using spr 1.3.5-bogner
@MaskRay
Copy link
Member Author

MaskRay commented Jul 30, 2024

As suggested by a binutils global maintainer, Jan Beulich, optimize-mapping-symbols has been renamed to optimize-mapsyms.
I will add clangDriver support for -Wa,--optimize-mapsyms once this is approved.

While binutils aarch64 maintainer Richard hasn't yet endorsed the solution, I believe there's significant value in having this option
for users who don't (a) mix data and code or (b) mix data-in-code text sections before or after this patch.

This feature is for LLVM 20, not for 19.x release, so that I will have more time to convince Richard.
(Although there is a desire to have this early to benefit my downstream... We do rolling releases and whether this catches up a release doesn't matter)

@MaskRay
Copy link
Member Author

MaskRay commented Aug 6, 2024

🥳🥳

@MaskRay
Copy link
Member Author

MaskRay commented Aug 8, 2024

Ping:)

@smithp35
Copy link
Collaborator

smithp35 commented Aug 9, 2024

Apologies for the delay in responding.

I'm uncomfortable with symbols defined at the end of the section as these fall through into the following section. I have seen that cause trouble before with 0 sized sections as contradictory mapping symbols at the same address are not well defined by the ABI. While it is true that the most likely implementation of a linker would order mapping symbols such that a trailing mapping symbol would get sorted before a mapping symbol at offset 0 in the next section; I'm not sure we can guarantee it, for example a non-stable sort of the symbols by address could break that. I also know of at least one linker that tries to optimize mapping symbols by removing redundant mapping symbols that will give an error message if it spots two contradictory mapping symbols at the same address.

I think it would be better to just omit the trailing $x mapping symbol as I expect it would cause more problems than it will solve. As -foptimize-mapsyms is opt-in and not the default, a user could be responsible for dealing with any corner-cases that arise with $d as the last mapping symbol in code-sections. For clang it looks like this is very rare. It is more common in GCC as it uses literal pools for -mcmodel=large. Clang uses 4 mov instructions instead. For what it is worth a colleague of mine is looking into adding execute-only support in AArch64, this largely means leaving code-generation alone, but giving an error if data is used in an executable section, this would guarantee no corner cases.

Going back to the ABI perhaps it would be worth looking at symbols rather than sections. Essentially can we make a symbol of type STT_FUNC with bit 0 clear (thinking of Morello, which uses 1 for C64 instructions) an implicit $x and a symbol of type STT_OBJECT an implicit $d, with all other symbols being neutral? This would solve the trailing $d problem for compiled code and well written assembly code (people are lazy about setting the type of symbols).

@MaskRay
Copy link
Member Author

MaskRay commented Aug 9, 2024

Thank you for your feedback. You suggest a scheme where the trailing $x mapping symbol is omitted to prevent potential contradictory mapping symbols at the same address.

.section .text.1,"ax"
nop
// emit $d
.long 42
// do not emit $x

This requires a smart linker that inserts extra mapping symbols.
While I understand the benefits of this approach, it's not suitable for all users.
Those who can tolerate potential mapping symbol conflicts or have tooling to handle them (even if not well defined) might prefer an ending mapping symbol.

I believe it's very difficult for Clang to emit trailing data in a text section.
However, hand-written assembly might do so, and in the absence of a smart linker, large regions of .text could be misidentified as data.

While I'm unfamiliar with Morello, omitting a starting label (often the function name) is possible in hand-written assembly. This disqualifies approaches that rely on start label st_value.
Additionally, I'm less comfortable with ABIs that use st_value&1 for state information (like Thumb and MicroMips).
This can lead to ambiguity when defining label differences (A-B vs. A-B+C) in assembly code. https://reviews.llvm.org/D157655


Is it feasible to have a cl::opt (not -Wa,--foo) to have the ending mapping symbol behavior?

I understand the preference for a limited number of schemes, but I anticipate that we may end up with three distinct options as users have different needs:
the current approach (there are already differences with gas; LLVM AArch32 does this https://reviews.llvm.org/D30724), the proposed patch, and a variation of the patch without the ending mapping symbol.

@smithp35
Copy link
Collaborator

smithp35 commented Aug 9, 2024

Morello definitely couldn't use this option https://github.com/ARM-software/abi-aa/blob/main/aaelf64-morello/aaelf64-morello.rst#mapping-symbols . Having said that Morello is not part of upstream LLVM and is an architecture experiment.

Yes I don't think there will be a single option that is going to work for everyone.

option impact
default covers all cases but too many mapping symbols
omit leading $x, trailing $x if required could cause $x and $d at same address, error in some tools. or be incorrect when a data section follows a code section with a trailing $x
omit leading $x, no trailing $x incorrect when data is last in the section and code follows in next section. Will have double $x when combined with the default option.
imply $x from STT_FUNC, $d from STT_OBJECT no guarantee that symbol is first, and user has typed symbol properly.

On reflection I think putting the trailing $x in is probably the least worst optimised solution as I expect it is the least likely to cause problems.

Is there a strong objection to optimising this at link time? There would be some run-time overhead to scan for redundant $x and removing them, but this would always give the same answer as the current default.

@MaskRay
Copy link
Member Author

MaskRay commented Aug 9, 2024

Thanks for the table:)

default | covers all cases but too many mapping symbols

True. (As aforementioned, GNU assembler's AArch32/AArch64 implementations omit mapping symbols in some cases, therefore not covering all cases
https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency#peculiar-alignment-behavior-in-gnu-assembler)

On reflection I think putting the trailing $x in is probably the least worst optimised solution as I expect it is the least likely to cause problems.

Thank you for acknowledging the benefits of the ending mapping symbol:)

Is there a strong objection to optimising this at link time? There would be some run-time overhead to scan for redundant $x and removing them, but this would always give the same answer as the current default.

Yes, to maitain relocatable file size and preserve the current linker performance.

My preliminary analysis revealed that making lld smarter will cause significant performance overhead
https://sourceware.org/pipermail/binutils/2024-July/135992.html

I have maded a local change to lld that stable sorts local symbols by address.
This results in a 5% increase in link time, making it impractical for
default on.

Tangentially, Chrome is now exploring CREL. Perhaps Android will follow up as well.

Reduced relocatable file size is particularly important for large projects.

  • For me, .o file size impacts the number of LLVM project builds I can store in a tmpfs filesystem. Large projects like Chrome often require 100GB+, creating significant disk space requirement for users. Even a single-digit percentage reduction in size is highly beneficial.
  • File transfer overhead is larger when the network is involved, which is an essential part of distributed build systems. There are costs like computing a digest (for deduplication/Bazel style build system) and others.

@smithp35
Copy link
Collaborator

Some thoughts after thinking about this on the weekend and talking to some colleagues.

Apologies for the length of the response, a brief summary:

  • I think this could be a useful option if projects build all their objects with it enabled. Would be good to know from potential users that they would enable this option in this review? So far I think it is only me that has commented.
  • My main concern is binary distribution of objects with it enabled, and what future choices this might rule out.
  • I'm still nervous about trailing mapping symbols, particularly with -fsanitize=undefined or -fsanitize=kcfi.
  • Can we alleviate these concerns by marking object files that use this mapping symbol convention. That way we could fix them up later if needed.
  • Is there an alternative way to do link-time culling of mapping symbols that runs faster?

In more detail.

The pragmatist in me says that this is a useful (possibly llvm only [1]) option for people with large builds to enable an alternative mapping symbol definition than the ABI and accept the consequences of that as well as the benefits. The idealist in me says that if this propagates out into binary distributions across multiple projects then we may have incompatibilities with GNU and while there are no plans to add one, it may rule out something like a second instruction set (like Morello or Thumb [2]).
[1] the comments on the binutils mailing list don't look enthusiastic.
[2] We can't know which instruction set is implicit.

A colleague working on Bolt mentioned that there could be a case for more mapping symbols to help binary analysis, this implies that there could be more mapping symbol density options than just optimize and default. He also mentioned that recording the mapping symbol level choice in the ELF file (mechanism TBD) might be a way around the future compatibility problem, and could extend to the additional mapping symbol cases. Does recording the use of the alternative mapping symbol mechanism appeal at all?

I'm still nervous about adding trailing $x. I had thought that AArch64 executable sections with a leading $d would be very rare, but I had forgotten about -fsanitize=kcfi and -fsanitize=undefined. These all lead with a $d. This makes collisions between trailing $x and $d more likely than I'd first thought.

I also don't think we can rely on the dumb linker collation order putting the trailing $x before the leading $d of the following section. For example, assuming we have lld first.o second.o with the following linker script fragment:

  .text : { second.o(.text) first.o(.text) }

If .text in first.o leads with a $d; like when -fsanitize=kcfi is used, and .text in second.o has a trailing $x then I think when we copy local symbols into the symbol table first.o will get copied first, which will be maintained by stable_partition so we'll get $d $x at the same address with the $x . This is all theoretical, I haven't tried testing it yet.

As an aside, I'm surprised the linker removal of redundant mapping symbols took so long. It looks like we already stable_partition the static symbol table in void SymbolTableBaseSection::finalizeContents(). Did you modify that to stable_sort rather than partition, or just stable_sort the local symbols after the partition?

Another possibility that I think would work in the majority of AArch64 cases without costing too much extra link time, which I think could be done before the stable_partition (to speed that up) is to see if each OutputSection always has the same mapping symbol type. For example if the .text section only has $x mapping symbols, which would be the most common case, then we can replace these with a single $x mapping symbol. If there any $d in there bail out. I think this could be done much more cheaply than 5% link time.

@statham-arm
Copy link
Collaborator

Can we alleviate these concerns by marking object files that use this mapping symbol convention. That way we could fix them up later if needed.

That was my first thought, too.

As far as I can see, this option is essentially inventing a nonstandard variation on the Arm ELF spec, which is not fully compatible with the standard one, as the PR description mentions:

This scheme works as long as the user can rule out some error scenarios:

  • .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior

Currently, as far as I can see, the only way that a user can rule out this error scenario is to use their own memory and self-discipline to keep track of which objects (or which libraries, or which whole projects) they're building using this nonstandard flag.

Surely it would be better to mark object files to indicate explicitly that they follow this nonstandard ABI. Then at the very least a linker can detect a mixture of the two and complain, even if it can't automatically compensate. That way the user doesn't have to do it all by self-discipline.

@MaskRay
Copy link
Member Author

MaskRay commented Aug 13, 2024

Some thoughts after thinking about this on the weekend and talking to some colleagues.

Apologies for the length of the response, a brief summary:

  • I think this could be a useful option if projects build all their objects with it enabled. Would be good to know from potential users that they would enable this option in this review? So far I think it is only me that has commented.
  • My main concern is binary distribution of objects with it enabled, and what future choices this might rule out.
  • I'm still nervous about trailing mapping symbols, particularly with -fsanitize=undefined or -fsanitize=kcfi.
  • Can we alleviate these concerns by marking object files that use this mapping symbol convention. That way we could fix them up later if needed.
  • Is there an alternative way to do link-time culling of mapping symbols that runs faster?

In more detail.

The pragmatist in me says that this is a useful (possibly llvm only [1]) option for people with large builds to enable an alternative mapping symbol definition than the ABI and accept the consequences of that as well as the benefits. The idealist in me says that if this propagates out into binary distributions across multiple projects then we may have incompatibilities with GNU and while there are no plans to add one, it may rule out something like a second instruction set (like Morello or Thumb [2]). [1] the comments on the binutils mailing list don't look enthusiastic. [2] We can't know which instruction set is implicit.

Thanks for the comprehensive summary. I concur with your analysis, including the subsequent points. I believe some users, like mine, are willing to accept the consequences.

A colleague working on Bolt mentioned that there could be a case for more mapping symbols to help binary analysis, this implies that there could be more mapping symbol density options than just optimize and default. He also mentioned that recording the mapping symbol level choice in the ELF file (mechanism TBD) might be a way around the future compatibility problem, and could extend to the additional mapping symbol cases. Does recording the use of the alternative mapping symbol mechanism appeal at all?

I'm still nervous about adding trailing $x. I had thought that AArch64 executable sections with a leading $d would be very rare, but I had forgotten about -fsanitize=kcfi and -fsanitize=undefined. These all lead with a $d. This makes collisions between trailing $x and $d more likely than I'd first thought.

Yes. -fsanitize=function and -fsanitize=kcfi emit executable sections with a leading $d.

I also don't think we can rely on the dumb linker collation order putting the trailing $x before the leading $d of the following section. For example, assuming we have lld first.o second.o with the following linker script fragment:

  .text : { second.o(.text) first.o(.text) }

If .text in first.o leads with a $d; like when -fsanitize=kcfi is used, and .text in second.o has a trailing $x then I think when we copy local symbols into the symbol table first.o will get copied first, which will be maintained by stable_partition so we'll get $d $x at the same address with the $x . This is all theoretical, I haven't tried testing it yet.

Agreed, this is a potential downside similar to --symbol-ordering-file and --call-graph-profile-sort.
lld will place first.o symbols before second.o symbols, potentially causing the undesired $d $x order.

Fortunately, this risk seems manageable for some users. First, text sections with trailing data are extremely rare.
Even if such a section is followed by a -fsanitize=function section (.long 0xc105cafe; .long 2772461324; code),
and the two words are misidentified as code, the following code won't, making this an acceptable risk for some users.

As an aside, I'm surprised the linker removal of redundant mapping symbols took so long. It looks like we already stable_partition the static symbol table in void SymbolTableBaseSection::finalizeContents(). Did you modify that to stable_sort rather than partition, or just stable_sort the local symbols after the partition?

I have developed this hack https://github.com/MaskRay/llvm-project/tree/lld-mapsym , 5% slowdown.

diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index c27ab2b67dc2..d8e4bc976016 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2172,4 +2172,9 @@ void SymbolTableBaseSection::sortSymTabSymbols() {
   getParent()->info = numLocals + 1;

+  for (auto &s : symbols)
+    s.va = s.sym->getVA();
+  std::stable_sort(symbols.begin(), e,
+                   [](auto &l, auto &r) { return l.va < r.va; });
+
   // We want to group the local symbols by file. For that we rebuild the local
   // part of the symbols vector. We do not need to care about the STT_FILE
@@ -2190,5 +2195,5 @@ void SymbolTableBaseSection::addSymbol(Symbol *b) {
   // Adding a local symbol to a .dynsym is a bug.
   assert(this->type != SHT_DYNSYM || !b->isLocal());
-  symbols.push_back({b, strTabSec.addString(b->getName(), false)});
+  symbols.push_back({b, strTabSec.addString(b->getName(), false), 0});
 }

@@ -2266,5 +2271,5 @@ template <class ELFT> void SymbolTableSection<ELFT>::writeTo(uint8_t *buf) {
       if (isDefinedHere) {
         eSym->st_shndx = shndx;
-        eSym->st_value = sym->getVA();
+        eSym->st_value = ent.va ? ent.va : sym->getVA();
         // Copy symbol size if it is a defined symbol. st_size is not
         // significant for undefined symbols, so whether copying it or not is up
@@ -2492,5 +2497,5 @@ void GnuHashTableSection::addSymbols(SmallVectorImpl<SymbolTableEntry> &v) {
   v.erase(mid, v.end());
   for (const Entry &ent : symbols)
-    v.push_back({ent.sym, ent.strTabOffset});
+    v.push_back({ent.sym, ent.strTabOffset, 0});
 }

diff --git a/lld/ELF/SyntheticSections.h b/lld/ELF/SyntheticSections.h
index d4169e1e1aca..c7e385119c03 100644
--- a/lld/ELF/SyntheticSections.h
+++ b/lld/ELF/SyntheticSections.h
@@ -642,4 +642,5 @@ struct SymbolTableEntry {
   Symbol *sym;
   size_t strTabOffset;
+  uint64_t va;
 };

Another possibility that I think would work in the majority of AArch64 cases without costing too much extra link time, which I think could be done before the stable_partition (to speed that up) is to see if each OutputSection always has the same mapping symbol type. For example if the .text section only has $x mapping symbols, which would be the most common case, then we can replace these with a single $x mapping symbol. If there any $d in there bail out. I think this could be done much more cheaply than 5% link time.

This approach (deleting extra mapping symbols) is for the default case. Bailing out could hinder a significant size optimization for executables with minor data-in-code elements.

default	| covers all cases but too many mapping symbols

Alternatively, removing leading $x can reduce relocatable file size, which will be preferred by my users.
We have two choices:

omit leading $x, trailing $x if required	| could cause $x and $d at same address, error in some tools. or be incorrect when a data section follows a code section with a trailing $x
omit leading $x, no trailing $x	| incorrect when data is last in the section and code follows in next section. Will have double $x when combined with the default option.

The first choice has been discussed above. Technically a linker could delete $x in a text section if it shares the address with a $d.
This feature could be implemented with minimal code, but I believe my users won't require this level of precision.

The second choice "omit leading $x, no trailing $x" requires a smart linker that builds a output-section-to-symbol map (.text => $x $d $x $d $x), inspects input section boundaries, and inserts extra $x.
I believe there is significant performance overhead (sorting), increased memory consumption (tracking input section boundaries without stealing one bit from lld::ELF::Symbol), and potential code complexity.

@MaskRay
Copy link
Member Author

MaskRay commented Aug 13, 2024

Can we alleviate these concerns by marking object files that use this mapping symbol convention. That way we could fix them up later if needed.

That was my first thought, too.

As far as I can see, this option is essentially inventing a nonstandard variation on the Arm ELF spec, which is not fully compatible with the standard one, as the PR description mentions:

This scheme works as long as the user can rule out some error scenarios:

  • .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior

Currently, as far as I can see, the only way that a user can rule out this error scenario is to use their own memory and self-discipline to keep track of which objects (or which libraries, or which whole projects) they're building using this nonstandard flag.

You're absolutely right. The nonstandard option is invaluable for those with full control over their toolchain, no reliance on weird relocatable files, and a strong focus on minimizing both relocatable and executable sizes.
This aligns with the needs of (a) large-scale build environments, (b) huge open-source projects, and (c) distributions like Gentoo.
(I haven't used Gentoo for a while, but I would certainly use this option to make my builds faster and smaller.)

Surely it would be better to mark object files to indicate explicitly that they follow this nonstandard ABI. Then at the very least a linker can detect a mixture of the two and complain, even if it can't automatically compensate. That way the user doesn't have to do it all by self-discipline.

While marking object files could be beneficial, it's essential to avoid overly restrictive compatibility checks. False positives could hinder development.
(I actually wanted to write a blog post about compatibility check and inconvenience, but I kept postponing the writing...)

For example, the following code assembled with a traditional assembler could get marked to be incompatible with .text; .nop using the nonstandard behavior.

.text
nop
.long 0

(This pattern is extremely rare, and can be ruled out with certain control over the toolchain, rendering a marker unnecessary.)

But pure code should not trigger incompatiblity warning.

.text
nop

@smithp35
Copy link
Collaborator

I don't think we need to use an object marking to do a compatibility check, while that is possible I don't think it is necessary in the majority of the cases.

What it does do is give us the chance to identify objects with the alternative marking scheme and unambiguously know that an executable section with no mapping symbols is either data or AArch64 code. This could help tools if the difference became important.

While I don't want to speculate too much over how to mark an object, I'm wondering if it is something you're willing to entertain?

As an aside; Arm's original ELF spec for its proprietary toolchain prior to the ABI in 2004 used to define a couple of ELF header flags:

  • EF_ARM_MAPSYMSFIRST all mapping symbols form a subsection which is first in the static symbol table, all remaining symbols form a subsection. The first symbol in the symbol table is $m the value of which was the number of mapping symbols, permitting a consumer to skip to the first non-mapping symbol.
  • EF_ARM_SYMSARESORTED symbols in a subsection are sorted in ascending address order.

These were rejected when the AArch32 ABI was published as specific to Arm's proprietary toolchain.

@smithp35
Copy link
Collaborator

To give you a summary of my thoughts so far:

  • I don't think we'll be able to change the AArch64 ABI as I don't think there's enough consensus on the GNU side for a relocatable object based solution.
  • An alternative, likely LLVM only, opt-in option for users with large projects, in control of their own toolchain to reduce mapping symbols could be valuable to those projects.
  • I'd like to discourage use of the opt-in option where portability of the relocatable objects is a concern. For example, while there are no plans to do so, I couldn't guarantee the AArch64 ABI might make changes that would retain backwards compatibility.
  • If the option becomes LLVM only then I'd prefer we gave the option a more neutral name. For example -mimplicit-mapsyms or -malternative-mapsyms as -moptimize-mapsyms doesn't imply a, small, trade-off in portability.
  • I'd like to see the help for the option mention compatibility. For example "Allow mapping symbol at section beginning to be implicit, lowers number of mapping symbols at the expense of some portability. Recommended for large projects that can build all their objects using this option."
  • I'd like to find a way of recording that implicit mapping symbols have been used, I don't think we need a compatibility check.

Created using spr 1.3.5-bogner
@MaskRay
Copy link
Member Author

MaskRay commented Aug 13, 2024

Renamed to "implicit-mapsyms"


To give you a summary of my thoughts so far:

  • I don't think we'll be able to change the AArch64 ABI as I don't think there's enough consensus on the GNU side for a relocatable object based solution.

Ack.

  • An alternative, likely LLVM only, opt-in option for users with large projects, in control of their own toolchain to reduce mapping symbols could be valuable to those projects.

Looks good.

  • I'd like to discourage use of the opt-in option where portability of the relocatable objects is a concern. For example, while there are no plans to do so, I couldn't guarantee the AArch64 ABI might make changes that would retain backwards compatibility.

I agree with your point about discouraging its use where portability is crucial.

Does the last sentence read as: the AArch64 ABI might make changes to break backwards compatibility. But with what?

As is, the proposed implicit-mapsyms option is non-conforming per
"A section that contains instructions must have a mapping symbol defined at the beginning of the section. If a section cont ains only data no mapping symbol is required.

I agree that users of the opt-in option should well understand the risks.

  • If the option becomes LLVM only then I'd prefer we gave the option a more neutral name. For example -mimplicit-mapsyms or -malternative-mapsyms as -moptimize-mapsyms doesn't imply a, small, trade-off in portability.

Looks good.
The MCTargetOption/cl::opt option (for llc/llvm-mc) has been renamed to implicit-mapsyms.
Another patch will be needed to add a driver option, perhaps -Wa,-mmapsyms=implicit.

  • I'd like to see the help for the option mention compatibility. For example "Allow mapping symbol at section beginning to be implicit, lowers number of mapping symbols at the expense of some portability. Recommended for large projects that can build all their objects using this option."

Thanks for the suggestion. Improved the message at llvm/lib/MC/MCTargetOptionsCommandFlags.cpp.

  • I'd like to find a way of recording that implicit mapping symbols have been used, I don't think we need a compatibility check.

Looks good. Recording the mapping symbol scheme might depend on SHT_AARCH64_ATTRIBUTES, which has no GNU/LLVM toolchain implementation yet.

#99718 (comment) While I don't want to speculate too much over how to mark an object, I'm wondering if it is something you're willing to entertain?

Yes, I'm open to exploring that once the base mechanism is available.

#99718 (comment) As an aside; Arm's original ELF spec for its proprietary toolchain prior to the ABI in 2004 used to define a couple of ELF header flags:

Thanks for the context.

@smithp35
Copy link
Collaborator

Just a quick comment to say thanks for the updates. I ran out of time to look today, will try and take a look tomorrow.

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are comments on the implementation/test only.

DenseMap<MCSection *, MCSymbol *> EndMappingSym;
for (MCSection &Sec : Asm) {
S.switchSection(&Sec);
if (S.LastEMS == (Sec.isText() ? AArch64ELFStreamer::EMS_Data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can only happen if someone creates a named section and forgets to put "x" in the flags. It is a bit disappointing that addition of an instruction doesn't implicitly set the "x" flag on a section. It looks like both GNU as and LLVM don't do this though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GNU assembler doesn't warn about instructions in data sections and LLVM follow suits: .section foo,"a"; nop. The code is written to respect the section flags (instead of potential user intention) and add ending $d if needed.

I'd hope that instructions in data sections lead to a warning.

DenseMap<MCSection *, size_t> Cnt;
Syms.truncate(NumSyms);
for (const MCSymbol *Sym : Syms)
if (Sym->isInSection())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have a count of symbols for every section. For a large object with lots of sections (coming from say LTO with -ffunction-sections) then if we expect trailing symbols to be rare, then it could be better to test if the section has a trailing mapping symbol early, and only retain counts for the sections that we are adding to.

May not be worth it for the extra complexity though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having 2 dense maps DenseMap<MCSection *, MCSymbol *> EndMappingSym; DenseMap<MCSection *, size_t> Cnt; was wasteful. I have changed the code to use one single map.

The performance benefit should be negligible as we cannot avoid the Syms iteration.

@@ -1,5 +1,10 @@
// RUN: llvm-mc -triple=aarch64 -filetype=obj %s | llvm-objdump -t - | FileCheck %s --match-full-lines
// RUN: llvm-mc -triple=aarch64 -filetype=obj -implicit-mapsyms %s | llvm-objdump -t - | FileCheck %s --check-prefix=CHECK1 --match-full-lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a comment on this test. I think it would be worth a test, probably in lld, or possibly via yaml2obj to test what the disassembler does when it gets two mapping symbols at the same address.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added lld/test/ELF/aarch64-mapsyms-implicit.s

@smithp35
Copy link
Collaborator

Renamed to "implicit-mapsyms"

To give you a summary of my thoughts so far:

  • I don't think we'll be able to change the AArch64 ABI as I don't think there's enough consensus on the GNU side for a relocatable object based solution.

Ack.

  • An alternative, likely LLVM only, opt-in option for users with large projects, in control of their own toolchain to reduce mapping symbols could be valuable to those projects.

Looks good.

  • I'd like to discourage use of the opt-in option where portability of the relocatable objects is a concern. For example, while there are no plans to do so, I couldn't guarantee the AArch64 ABI might make changes that would retain backwards compatibility.

I agree with your point about discouraging its use where portability is crucial.

Does the last sentence read as: the AArch64 ABI might make changes to break backwards compatibility. But with what?

If another ISA like Thumb is to Arm is added to AArch64 [*] then we wouldn't be able to tell which instruction set to start with for an executable section. There are likely some implicit assumptions we can make if we're sensible, such as defaulting to AArch64 if there's no mapping symbol, and if the other ISA is used it needs a mapping symbol. I suspect that llvm tools could be made to work, however it could get messy with GNU and other tools.

[*] Morello's hybrid mode does this. A subset of the CHERI instructions can be used in EM_AARCH64 after some transitioning like Arm to Thumb. The majority of experimentation with Morello has been in pure capability mode which is a separate machine from AARCH64.

Looks good. Recording the mapping symbol scheme might depend on SHT_AARCH64_ATTRIBUTES, which has no GNU/LLVM toolchain implementation yet.

#99718 (comment) While I don't want to speculate too much over how to mark an object, I'm wondering if it is something you're willing to entertain?

Yes, I'm open to exploring that once the base mechanism is available.

I'm expecting a build attributes implementation in GNU fairly shortly for GCS assuming it can hit early enough for the GCC release cycle. It may have to slip to the next GCC release if it doesn't make it in time though. I'm hoping that we can get a LLVM implementation started for LLVM 20.

Created using spr 1.3.5-bogner
@MaskRay
Copy link
Member Author

MaskRay commented Aug 20, 2024

Created clangDriver patch: #104542

(I think this patch is very meaningful and we should not make it blocked by the build attribute addition.)

@smithp35
Copy link
Collaborator

Apologies for the delay in responding, I'm aiming to take a look tomorrow, as well as the other ones in the series.

Created using spr 1.3.5-bogner
Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates.

Although possibly not for this patch. It looks llvm-objdump (and gnu objdump) look to reorder $x after $d when the symbols are at the same address. This looks like worse case behaviour.

I think we could address this by looking at llvm-objdump, or we could have lld filter out the clashes.

The other tool that I thought could be affected by this is Bolt. It uses mapping symbols to determine the location of constant islands. It looks to do this with CodeOffsets and DataOffsets (some references below). Are your large programs likely to be post-processed by Bolt?

Some references:
https://github.com/llvm/llvm-project/blob/main/bolt/include/bolt/Core/BinaryFunction.h#L1972
https://github.com/llvm/llvm-project/blob/main/bolt/lib/Core/BinaryEmitter.cpp#L539
https://github.com/llvm/llvm-project/blob/main/bolt/lib/Core/BinaryFunction.cpp#L1049

Apologies to keep bringing these up. What I want to avoid is people finding other llvm tools subtly broken. I don't think that these are a blocker for implicit mapsyms, but it should influence the level of warning we give for the option. We may want to call out Bolt specifically as a tool that could be affected for example.

I think the code here is good for llvm-mc. I don't have any more comments on this patch.

I've got a last couple of questions:

  • Do you want to update llvm-objdump as whatever collation order the linker uses, it is going to order $d before $x? I think that could be done in different patch.
  • If Bolt could get confused by two symbols at the same address I think we'll need to call that out in the clang driver help text. Do you think it would be worth an lld patch to remove (or make both symbols the same) when there is a clash? That would make Bolt work for a LLVM based toolchain.

# CHECK-EMPTY:
# CHECK-NEXT: <$d>:
# CHECK-NEXT: <$x>:
# CHECK-NEXT: udf #42
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points at an issue in llvm-objdump. It could be sorting symbols at the same address by name.

Looking at llvm-readelf --symbols for 0x201134 I can see $x before $d as we would expect, however llvm-objdump has put the $d before $x.

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND
     1: 0000000000210124     0 NOTYPE  LOCAL  DEFAULT     1 $d
     2: 0000000000210128     0 NOTYPE  LOCAL  DEFAULT     1 $x
     3: 0000000000210130     0 NOTYPE  LOCAL  DEFAULT     2 $d
     4: 0000000000210134     0 NOTYPE  LOCAL  DEFAULT     2 $x
     5: 0000000000210134     0 NOTYPE  LOCAL  DEFAULT     2 $d
     6: 0000000000210138     0 NOTYPE  LOCAL  DEFAULT     2 $x
     7: 0000000000210120     0 NOTYPE  GLOBAL DEFAULT     1 _start

This is unfortunate as it means every trailing $x is going to result in an incorrect disassembly when there's a leading $d.

@smithp35
Copy link
Collaborator

Just a quick note that I'll be out of office from Friday 23rd. Will be back on Tuesday 27th so I may be a bit slow to respond.

@MaskRay
Copy link
Member Author

MaskRay commented Aug 21, 2024

Although possibly not for this patch. It looks llvm-objdump (and gnu objdump) look to reorder $x after $d when the symbols are at the same address. This looks like worse case behaviour.

Yes, the -fsanitize=function and -fsanitize=kcfi cases that many people don't use.
In particular, as a contributor of -fsanitize=kcfi, I think the feature should not be used for userspace programs.
They are specific to the Linux kernel, which expects a specific conditional branch instruction and a specific trap instruction.
(FreeBSD kernel could adopt it if they decide to do something similar to the Linux kernel.)

I think we could address this by looking at llvm-objdump, or we could have lld filter out the clashes.

Yes. The test intends to demonstrate the weakness.

The other tool that I thought could be affected by this is Bolt. It uses mapping symbols to determine the location of constant islands. It looks to do this with CodeOffsets and DataOffsets (some references below). Are your large programs likely to be post-processed by Bolt?

No, not post-processed by BOLT.

Do you want to update llvm-objdump as whatever collation order the linker uses, it is going to order $d before $x? I think that could be done in different patch.

Yes, I will take a stab.

  • If Bolt could get confused by two symbols at the same address I think we'll need to call that out in the clang driver help text. Do you think it would be worth an lld patch to remove (or make both symbols the same) when there is a clash? That would make Bolt work for a LLVM based toolchain.

I want to change clang -Wa, options to use TableGen so that we can place additional help messages.
Right now -Wa, options don't have a way to provide messages.
clang/docs/UsersManual.rst might also be a choice, if we don't add messages to --help.

Currently, we could only rely on the cl::opt message. I feel that BOLT is a very specific detail that cannot be well described a brief message...


Enjoy your vacation! I'll take a flight on Friday and will be slow to respond for about 3 weeks.

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback. I don't have any more feedback on this specific patch.

Hopefully we'll be able to improve llvm-objdump and possibly lld to avoid the corner cases.

Thanks for letting us know you'll be slow to respond over the next few weeks.

@MaskRay MaskRay merged commit 46707b0 into main Aug 22, 2024
6 of 8 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/aarch64elf-allow-implicit-dx-at-section-beginning-1 branch August 22, 2024 16:12
MaskRay added a commit that referenced this pull request Aug 22, 2024
-Wa,-mmapsyms=implicit enables the alternative mapping symbol scheme
discussed at #99718.

While not conforming to the current aaelf64 ABI, the option is
invaluable for those with full control over their toolchain, no reliance
on weird relocatable files, and a strong focus on minimizing both
relocatable and executable sizes.

The option is discouraged when portability of the relocatable objects is
a concern.
https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency
elaborates the risk.

Pull Request: #104542
cjdb pushed a commit to cjdb/llvm-project that referenced this pull request Aug 23, 2024
The start state of a new section is `EMS_None`, often leading to a
$d/$x at offset 0. Introduce a MCTargetOption/cl::opt
"implicit-mapsyms" to allow an alternative behavior
(ARM-software/abi-aa#274):

* Set the start state to `EMS_Data` or `EMS_A64`.
* For text sections, add an ending $x only if the final data is not instructions.
* For non-text sections, add an ending $d only if the final data is not data commands.

```
.section .text.1,"ax"
nop
// emit $d
.long 42
// emit $x

.section .text.2,"ax"
nop
```

This new behavior decreases the .symtab size significantly:

```
% bloaty a64-2/bin/clang -- a64-0/bin/clang
    FILE SIZE        VM SIZE
 --------------  --------------
  -5.4% -1.13Mi  [ = ]       0    .strtab
 -50.9% -4.09Mi  [ = ]       0    .symtab
  -4.0% -5.22Mi  [ = ]       0    TOTAL
```

---

This scheme works as long as the user can rule out some error scenarios:

* .text.1 assembled using the traditional behavior is combined with .text.2 using the new behavior
* A linker script combining non-text sections and text sections. The
  lack of mapping symbols in the non-text sections could make them
  treated as code, unless the linker inserts extra mapping symbols.

The above mix-and-match scenarios aren't an issue at all for a
significant portion of users.

A text section may start with data commands in rare cases (e.g.
-fsanitize=function) that many users don't care about. When combing
`(.text.0; .word 0)` and `(.text.1; .word 0)`, the ending $x of .text.0
and the initial $d of .text.1 may have the same address. If both
sections reside in the same file, ensure the ending symbol comes before
the initial $d of .text.1, so that a dumb linker respecting the symbol
order will place the ending $x before the initial $d.

Disassemblers using stable sort will see both symbols at the same
address, and the second will win.

When section ordering mechanisms (e.g. --symbol-ordering-file,
--call-graph-profile-sort, `.text : { second.o(.text) first.o(.text) }`)
are involved, the initial data in a text section following a text
section with trailing data could be misidentified as code, but the issue
is local and the risk could be acceptable.

Pull Request: llvm#99718
cjdb pushed a commit to cjdb/llvm-project that referenced this pull request Aug 23, 2024
-Wa,-mmapsyms=implicit enables the alternative mapping symbol scheme
discussed at llvm#99718.

While not conforming to the current aaelf64 ABI, the option is
invaluable for those with full control over their toolchain, no reliance
on weird relocatable files, and a strong focus on minimizing both
relocatable and executable sizes.

The option is discouraged when portability of the relocatable objects is
a concern.
https://maskray.me/blog/2024-07-21-mapping-symbols-rethinking-for-efficiency
elaborates the risk.

Pull Request: llvm#104542
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants