[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux #134669

statham-arm · 2025-04-07T15:17:39Z

This is a conditional revert of cca40aa, which made LLVM's branch-target-enforcement mode generate BTI at the start of every function, even in the case where the function has internal linkage and its address is never taken for use in an indirect call.

The rationale was that it might turn out at link time that a direct call to the function spanned a larger distance than the range of a BL instruction (say, if the translation unit generated multiple code sections and the linker put them a very long way apart). Then the linker might insert a long-branch thunk using an indirect call instruction.

SYSVABI64 has now clarified that in this situation the static linker may not assume that the target function is safe to call directly. If it needs to use this strategy, it's responsible for also generating a 'landing pad' near the target function, with a BTI followed by a direct branch, and using that as the target of the long-distance indirect call.
ARM-software/abi-aa@606ce44

LLD complies with this spec as of commit 098b0d1.

So if we're compiling in a mode that respects SYSVABI64, such as targeting Linux, it's safe to leave out the BTI at the start of a function with internal linkage if we can prove that its address isn't either used in an indirect call in this translation unit or passed out of the object.

Therefore, this patch goes back to the behavior before cca40aa, leaving out BTIs in functions that can't be called indirectly, but only if the target triple is Linux. (I wasn't able to find a more precise query for "is this a SYSVABI64-compliant platform?", but Linux certainly is, and this check at least fails in the safe direction - if in doubt, we put in all the BTIs that might be necessary.)

This is a conditional revert of cca40aa, which made LLVM's branch-target-enforcement mode generate BTI at the start of _every_ function, even in the case where the function has internal linkage and its address is never taken for use in an indirect call. The rationale was that it might turn out at link time that a direct call to the function spanned a larger distance than the range of a BL instruction (say, if the translation unit generated multiple code sections and the linker put them a very long way apart). Then the linker might insert a long-branch thunk using an indirect call instruction. SYSVABI64 has now clarified that in this situation the static linker may not assume that the target function is safe to call directly. If it needs to use this strategy, it's responsible for also generating a 'landing pad' near the target function, with a BTI followed by a direct branch, and using that as the target of the long-distance indirect call. ARM-software/abi-aa@606ce44 LLD complies with this spec as of commit 098b0d1. So if we're compiling in a mode that respects SYSVABI64, such as targeting Linux, it's safe to leave out the BTI at the start of a function with internal linkage if we can prove that its address isn't either used in an indirect call in _this_ translation unit or passed out of the object. Therefore, this patch goes back to the behavior before cca40aa, leaving out BTIs in functions that can't be called indirectly, but only if the target triple is Linux. (I wasn't able to find a more precise query for "is this a SYSVABI64-compliant platform?", but Linux certainly is, and this check at least fails in the safe direction - if in doubt, we put in all the BTIs that might be necessary.)

llvmbot · 2025-04-07T15:18:13Z

@llvm/pr-subscribers-backend-aarch64

Author: Simon Tatham (statham-arm)

Changes

This is a conditional revert of cca40aa, which made LLVM's branch-target-enforcement mode generate BTI at the start of every function, even in the case where the function has internal linkage and its address is never taken for use in an indirect call.

The rationale was that it might turn out at link time that a direct call to the function spanned a larger distance than the range of a BL instruction (say, if the translation unit generated multiple code sections and the linker put them a very long way apart). Then the linker might insert a long-branch thunk using an indirect call instruction.

SYSVABI64 has now clarified that in this situation the static linker may not assume that the target function is safe to call directly. If it needs to use this strategy, it's responsible for also generating a 'landing pad' near the target function, with a BTI followed by a direct branch, and using that as the target of the long-distance indirect call.
ARM-software/abi-aa@606ce44

LLD complies with this spec as of commit 098b0d1.

So if we're compiling in a mode that respects SYSVABI64, such as targeting Linux, it's safe to leave out the BTI at the start of a function with internal linkage if we can prove that its address isn't either used in an indirect call in this translation unit or passed out of the object.

Therefore, this patch goes back to the behavior before cca40aa, leaving out BTIs in functions that can't be called indirectly, but only if the target triple is Linux. (I wasn't able to find a more precise query for "is this a SYSVABI64-compliant platform?", but Linux certainly is, and this check at least fails in the safe direction - if in doubt, we put in all the BTIs that might be necessary.)

Full diff: https://github.com/llvm/llvm-project/pull/134669.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64BranchTargets.cpp (+14-6)
(modified) llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll (+11-5)

diff --git a/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp b/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
index b9feb83339d8d..c60fbb63c73ab 100644
--- a/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
+++ b/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
@@ -65,6 +65,7 @@ bool AArch64BranchTargets::runOnMachineFunction(MachineFunction &MF) {
   LLVM_DEBUG(
       dbgs() << "********** AArch64 Branch Targets  **********\n"
              << "********** Function: " << MF.getName() << '\n');
+  const Function &F = MF.getFunction();
 
   // LLVM does not consider basic blocks which are the targets of jump tables
   // to be address-taken (the address can't escape anywhere else), but they are
@@ -78,16 +79,23 @@ bool AArch64BranchTargets::runOnMachineFunction(MachineFunction &MF) {
   bool HasWinCFI = MF.hasWinCFI();
   for (MachineBasicBlock &MBB : MF) {
     bool CouldCall = false, CouldJump = false;
-    // Even in cases where a function has internal linkage and is only called
-    // directly in its translation unit, it can still be called indirectly if
-    // the linker decides to add a thunk to it for whatever reason (say, for
-    // example, if it is finally placed far from its call site and a BL is not
-    // long-range enough). PLT entries and tail-calls use BR, but when they are
+    // If the function is address-taken or externally-visible, it could be
+    // indirectly called. PLT entries and tail-calls use BR, but when they are
     // are in guarded pages should all use x16 or x17 to hold the called
     // address, so we don't need to set CouldJump here. BR instructions in
     // non-guarded pages (which might be non-BTI-aware code) are allowed to
     // branch to a "BTI c" using any register.
-    if (&MBB == &*MF.begin())
+    //
+    // For SysV targets, this is enough, because SYSVABI64 says that if the
+    // static linker later wants to use an indirect branch instruction in a
+    // long-branch thunk, it's also responsible for adding a 'landing pad' with
+    // a BTI, and pointing the indirect branch at that. However, at present
+    // this guarantee only holds for targets complying with SYSVABI64, so for
+    // other targets we must assume that `CouldCall` is _always_ true due to
+    // the risk of long-branch thunks at link time.
+    if (&MBB == &*MF.begin() &&
+        (!MF.getSubtarget<AArch64Subtarget>().isTargetLinux() ||
+         (F.hasAddressTaken() || !F.hasLocalLinkage())))
       CouldCall = true;
 
     // If the block itself is address-taken, it could be indirectly branched
diff --git a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
index 85f5f6fa4674a..6d5dfc9d8fae4 100644
--- a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
+++ b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
@@ -1,4 +1,5 @@
-; RUN: llc -mtriple=aarch64 -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s --check-prefixes=CHECK,SYSV
+; RUN: llc -mtriple=aarch64-none-elf -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s --check-prefixes=CHECK,NONSYSV
 
 define void @f0() "patchable-function-entry"="0" "branch-target-enforcement" {
 ; CHECK-LABEL: f0:
@@ -48,20 +49,25 @@ define void @f2_1() "patchable-function-entry"="1" "patchable-function-prefix"="
 }
 
 ;; -fpatchable-function-entry=1 -mbranch-protection=bti
-;; We add BTI c even when the function has internal linkage
+;; For SysV compliant targets, we don't add BTI (or create the .Lpatch0 symbol)
+;; because the function has internal linkage and isn't address-taken. For
+;; non-SysV targets, we do add the BTI, because outside SYSVABI64 there's no
+;; spec preventing the static linker from using an indirect call instruction in
+;; a long-branch thunk inserted at link time.
 define internal void @f1i(i64 %v) "patchable-function-entry"="1" "branch-target-enforcement" {
 ; CHECK-LABEL: f1i:
 ; CHECK-NEXT: .Lfunc_begin3:
 ; CHECK:      // %bb.0:
-; CHECK-NEXT:  hint #34
-; CHECK-NEXT:  .Lpatch1:
+; NONSYSV-NEXT:  hint #34
+; NONSYSV-NEXT:  .Lpatch1:
 ; CHECK-NEXT:  nop
 ;; Other basic blocks have BTI, but they don't affect our decision to not create .Lpatch0
 ; CHECK:      .LBB{{.+}} // %sw.bb1
 ; CHECK-NEXT:  hint #36
 ; CHECK:      .section __patchable_function_entries,"awo",@progbits,f1i{{$}}
 ; CHECK-NEXT: .p2align 3
-; CHECK-NEXT: .xword .Lpatch1
+; NONSYSV-NEXT: .xword .Lpatch1
+; SYSV-NEXT: .xword .Lfunc_begin3
 entry:
   switch i64 %v, label %sw.bb0 [
     i64 1, label %sw.bb1

smithp35

I think this is the right thing to do as both GNU ld [1] and lld will insert a BTI followed by a B when inserting a long branch thunk/veneer. Reducing the number of BTI instructions in the binary reduces the number of indirect branch targets, which improves security, and reduces code-size.

I've raised an ABI issue to widen the sysvabi64 text to bare-metal targets too ARM-software/abi-aa#321 as the gnu ld and lld changes would work there too.

I'm not sure what the state of BTI use is on Windows and Darwin. It would be safe to assume that any long-branch insertion didn't insert BTI instructions though.

[1] GNU ld https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

MaskRay · 2025-04-07T16:25:31Z

llvm/lib/Target/AArch64/AArch64BranchTargets.cpp

+    // other targets we must assume that `CouldCall` is _always_ true due to
+    // the risk of long-branch thunks at link time.
+    if (&MBB == &*MF.begin() &&
+        (!MF.getSubtarget<AArch64Subtarget>().isTargetLinux() ||


Use isOSBinFormatELF(). We could omit the landing pad for all ELF platforms.

I'm concerned about that because there's no ABI guarantee that it will work for all ELF platforms -- just because lld behaves in the right way, doesn't mean it's specified to work that way for all linkers.

So I've landed the patch as it is for the moment, which at least improves things in some situations. I'm happy to come back and make that generalisation if ARM-software/abi-aa#321 is fixed, though.

I think this applies to at least all ELF platforms minus bare-metal... For example, OpenBSD would like this as well. If Fuchsia deploys BTI, they could benefit from this as they only use very new lld.

But now that you have landed this, we could wait for bare-metal ABI to be changed before generalizing this...

Yes, as I mentioned in the commit message, I didn't find any existing LLVM query function that would return true for Linux, BSD, and other things expected to use SysV style shared libraries.

I could write one, and let other people extend it to include more OSes as necessary. But if we're going to require this in AAELF64, then it's probably a waste of effort to do anything that complicated -- simpler to wait for that ABI PR to land and then go with your original suggestion of just testing for ELF.

statham-arm requested review from MaskRay, pbarrio, zmodem, DanielKristofKiss and smithp35 April 7, 2025 15:17

llvmbot added the backend:AArch64 label Apr 7, 2025

smithp35 reviewed Apr 7, 2025

View reviewed changes

MaskRay approved these changes Apr 7, 2025

View reviewed changes

statham-arm merged commit 7af2b51 into llvm:main Apr 8, 2025
13 checks passed

statham-arm deleted the remove-unnecessary-bti branch April 9, 2025 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux #134669

[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux #134669

statham-arm commented Apr 7, 2025

llvmbot commented Apr 7, 2025

smithp35 left a comment •

edited

Loading

MaskRay Apr 7, 2025

statham-arm Apr 8, 2025

MaskRay Apr 8, 2025

statham-arm Apr 9, 2025

[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux #134669

[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux #134669

Conversation

statham-arm commented Apr 7, 2025

llvmbot commented Apr 7, 2025

smithp35 left a comment • edited Loading

Choose a reason for hiding this comment

MaskRay Apr 7, 2025

Choose a reason for hiding this comment

statham-arm Apr 8, 2025

Choose a reason for hiding this comment

MaskRay Apr 8, 2025

Choose a reason for hiding this comment

statham-arm Apr 9, 2025

Choose a reason for hiding this comment

smithp35 left a comment •

edited

Loading