Skip to content

[AArch64][v8.5A] Omit BTI for non-addr-taken static fns on Linux #134669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 8, 2025

Conversation

statham-arm
Copy link
Collaborator

This is a conditional revert of cca40aa, which made LLVM's branch-target-enforcement mode generate BTI at the start of every function, even in the case where the function has internal linkage and its address is never taken for use in an indirect call.

The rationale was that it might turn out at link time that a direct call to the function spanned a larger distance than the range of a BL instruction (say, if the translation unit generated multiple code sections and the linker put them a very long way apart). Then the linker might insert a long-branch thunk using an indirect call instruction.

SYSVABI64 has now clarified that in this situation the static linker may not assume that the target function is safe to call directly. If it needs to use this strategy, it's responsible for also generating a 'landing pad' near the target function, with a BTI followed by a direct branch, and using that as the target of the long-distance indirect call.
ARM-software/abi-aa@606ce44

LLD complies with this spec as of commit 098b0d1.

So if we're compiling in a mode that respects SYSVABI64, such as targeting Linux, it's safe to leave out the BTI at the start of a function with internal linkage if we can prove that its address isn't either used in an indirect call in this translation unit or passed out of the object.

Therefore, this patch goes back to the behavior before cca40aa, leaving out BTIs in functions that can't be called indirectly, but only if the target triple is Linux. (I wasn't able to find a more precise query for "is this a SYSVABI64-compliant platform?", but Linux certainly is, and this check at least fails in the safe direction - if in doubt, we put in all the BTIs that might be necessary.)

This is a conditional revert of cca40aa, which made LLVM's
branch-target-enforcement mode generate BTI at the start of _every_
function, even in the case where the function has internal linkage and
its address is never taken for use in an indirect call.

The rationale was that it might turn out at link time that a direct
call to the function spanned a larger distance than the range of a BL
instruction (say, if the translation unit generated multiple code
sections and the linker put them a very long way apart). Then the
linker might insert a long-branch thunk using an indirect call
instruction.

SYSVABI64 has now clarified that in this situation the static linker
may not assume that the target function is safe to call directly. If
it needs to use this strategy, it's responsible for also generating a
'landing pad' near the target function, with a BTI followed by a
direct branch, and using that as the target of the long-distance
indirect call.
ARM-software/abi-aa@606ce44

LLD complies with this spec as of commit 098b0d1.

So if we're compiling in a mode that respects SYSVABI64, such as
targeting Linux, it's safe to leave out the BTI at the start of a
function with internal linkage if we can prove that its address isn't
either used in an indirect call in _this_ translation unit or passed
out of the object.

Therefore, this patch goes back to the behavior before
cca40aa, leaving out BTIs in functions that can't be called
indirectly, but only if the target triple is Linux. (I wasn't able to
find a more precise query for "is this a SYSVABI64-compliant
platform?", but Linux certainly is, and this check at least fails in
the safe direction - if in doubt, we put in all the BTIs that might be
necessary.)
@llvmbot
Copy link
Member

llvmbot commented Apr 7, 2025

@llvm/pr-subscribers-backend-aarch64

Author: Simon Tatham (statham-arm)

Changes

This is a conditional revert of cca40aa, which made LLVM's branch-target-enforcement mode generate BTI at the start of every function, even in the case where the function has internal linkage and its address is never taken for use in an indirect call.

The rationale was that it might turn out at link time that a direct call to the function spanned a larger distance than the range of a BL instruction (say, if the translation unit generated multiple code sections and the linker put them a very long way apart). Then the linker might insert a long-branch thunk using an indirect call instruction.

SYSVABI64 has now clarified that in this situation the static linker may not assume that the target function is safe to call directly. If it needs to use this strategy, it's responsible for also generating a 'landing pad' near the target function, with a BTI followed by a direct branch, and using that as the target of the long-distance indirect call.
ARM-software/abi-aa@606ce44

LLD complies with this spec as of commit 098b0d1.

So if we're compiling in a mode that respects SYSVABI64, such as targeting Linux, it's safe to leave out the BTI at the start of a function with internal linkage if we can prove that its address isn't either used in an indirect call in this translation unit or passed out of the object.

Therefore, this patch goes back to the behavior before cca40aa, leaving out BTIs in functions that can't be called indirectly, but only if the target triple is Linux. (I wasn't able to find a more precise query for "is this a SYSVABI64-compliant platform?", but Linux certainly is, and this check at least fails in the safe direction - if in doubt, we put in all the BTIs that might be necessary.)


Full diff: https://github.com/llvm/llvm-project/pull/134669.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64BranchTargets.cpp (+14-6)
  • (modified) llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll (+11-5)
diff --git a/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp b/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
index b9feb83339d8d..c60fbb63c73ab 100644
--- a/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
+++ b/llvm/lib/Target/AArch64/AArch64BranchTargets.cpp
@@ -65,6 +65,7 @@ bool AArch64BranchTargets::runOnMachineFunction(MachineFunction &MF) {
   LLVM_DEBUG(
       dbgs() << "********** AArch64 Branch Targets  **********\n"
              << "********** Function: " << MF.getName() << '\n');
+  const Function &F = MF.getFunction();
 
   // LLVM does not consider basic blocks which are the targets of jump tables
   // to be address-taken (the address can't escape anywhere else), but they are
@@ -78,16 +79,23 @@ bool AArch64BranchTargets::runOnMachineFunction(MachineFunction &MF) {
   bool HasWinCFI = MF.hasWinCFI();
   for (MachineBasicBlock &MBB : MF) {
     bool CouldCall = false, CouldJump = false;
-    // Even in cases where a function has internal linkage and is only called
-    // directly in its translation unit, it can still be called indirectly if
-    // the linker decides to add a thunk to it for whatever reason (say, for
-    // example, if it is finally placed far from its call site and a BL is not
-    // long-range enough). PLT entries and tail-calls use BR, but when they are
+    // If the function is address-taken or externally-visible, it could be
+    // indirectly called. PLT entries and tail-calls use BR, but when they are
     // are in guarded pages should all use x16 or x17 to hold the called
     // address, so we don't need to set CouldJump here. BR instructions in
     // non-guarded pages (which might be non-BTI-aware code) are allowed to
     // branch to a "BTI c" using any register.
-    if (&MBB == &*MF.begin())
+    //
+    // For SysV targets, this is enough, because SYSVABI64 says that if the
+    // static linker later wants to use an indirect branch instruction in a
+    // long-branch thunk, it's also responsible for adding a 'landing pad' with
+    // a BTI, and pointing the indirect branch at that. However, at present
+    // this guarantee only holds for targets complying with SYSVABI64, so for
+    // other targets we must assume that `CouldCall` is _always_ true due to
+    // the risk of long-branch thunks at link time.
+    if (&MBB == &*MF.begin() &&
+        (!MF.getSubtarget<AArch64Subtarget>().isTargetLinux() ||
+         (F.hasAddressTaken() || !F.hasLocalLinkage())))
       CouldCall = true;
 
     // If the block itself is address-taken, it could be indirectly branched
diff --git a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
index 85f5f6fa4674a..6d5dfc9d8fae4 100644
--- a/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
+++ b/llvm/test/CodeGen/AArch64/patchable-function-entry-bti.ll
@@ -1,4 +1,5 @@
-; RUN: llc -mtriple=aarch64 -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s
+; RUN: llc -mtriple=aarch64-linux-gnu -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s --check-prefixes=CHECK,SYSV
+; RUN: llc -mtriple=aarch64-none-elf -aarch64-min-jump-table-entries=4 %s -o - | FileCheck %s --check-prefixes=CHECK,NONSYSV
 
 define void @f0() "patchable-function-entry"="0" "branch-target-enforcement" {
 ; CHECK-LABEL: f0:
@@ -48,20 +49,25 @@ define void @f2_1() "patchable-function-entry"="1" "patchable-function-prefix"="
 }
 
 ;; -fpatchable-function-entry=1 -mbranch-protection=bti
-;; We add BTI c even when the function has internal linkage
+;; For SysV compliant targets, we don't add BTI (or create the .Lpatch0 symbol)
+;; because the function has internal linkage and isn't address-taken. For
+;; non-SysV targets, we do add the BTI, because outside SYSVABI64 there's no
+;; spec preventing the static linker from using an indirect call instruction in
+;; a long-branch thunk inserted at link time.
 define internal void @f1i(i64 %v) "patchable-function-entry"="1" "branch-target-enforcement" {
 ; CHECK-LABEL: f1i:
 ; CHECK-NEXT: .Lfunc_begin3:
 ; CHECK:      // %bb.0:
-; CHECK-NEXT:  hint #34
-; CHECK-NEXT:  .Lpatch1:
+; NONSYSV-NEXT:  hint #34
+; NONSYSV-NEXT:  .Lpatch1:
 ; CHECK-NEXT:  nop
 ;; Other basic blocks have BTI, but they don't affect our decision to not create .Lpatch0
 ; CHECK:      .LBB{{.+}} // %sw.bb1
 ; CHECK-NEXT:  hint #36
 ; CHECK:      .section __patchable_function_entries,"awo",@progbits,f1i{{$}}
 ; CHECK-NEXT: .p2align 3
-; CHECK-NEXT: .xword .Lpatch1
+; NONSYSV-NEXT: .xword .Lpatch1
+; SYSV-NEXT: .xword .Lfunc_begin3
 entry:
   switch i64 %v, label %sw.bb0 [
     i64 1, label %sw.bb1

Copy link
Collaborator

@smithp35 smithp35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right thing to do as both GNU ld [1] and lld will insert a BTI followed by a B when inserting a long branch thunk/veneer. Reducing the number of BTI instructions in the binary reduces the number of indirect branch targets, which improves security, and reduces code-size.

I've raised an ABI issue to widen the sysvabi64 text to bare-metal targets too ARM-software/abi-aa#321 as the gnu ld and lld changes would work there too.

I'm not sure what the state of BTI use is on Windows and Darwin. It would be safe to assume that any long-branch insertion didn't insert BTI instructions though.

[1] GNU ld https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671

// other targets we must assume that `CouldCall` is _always_ true due to
// the risk of long-branch thunks at link time.
if (&MBB == &*MF.begin() &&
(!MF.getSubtarget<AArch64Subtarget>().isTargetLinux() ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use isOSBinFormatELF(). We could omit the landing pad for all ELF platforms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about that because there's no ABI guarantee that it will work for all ELF platforms -- just because lld behaves in the right way, doesn't mean it's specified to work that way for all linkers.

So I've landed the patch as it is for the moment, which at least improves things in some situations. I'm happy to come back and make that generalisation if ARM-software/abi-aa#321 is fixed, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this applies to at least all ELF platforms minus bare-metal... For example, OpenBSD would like this as well. If Fuchsia deploys BTI, they could benefit from this as they only use very new lld.

But now that you have landed this, we could wait for bare-metal ABI to be changed before generalizing this...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I mentioned in the commit message, I didn't find any existing LLVM query function that would return true for Linux, BSD, and other things expected to use SysV style shared libraries.

I could write one, and let other people extend it to include more OSes as necessary. But if we're going to require this in AAELF64, then it's probably a waste of effort to do anything that complicated -- simpler to wait for that ABI PR to land and then go with your original suggestion of just testing for ELF.

@statham-arm statham-arm merged commit 7af2b51 into llvm:main Apr 8, 2025
13 checks passed
@statham-arm statham-arm deleted the remove-unnecessary-bti branch April 9, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants