-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[lld-macho] Symbols in __mod_init_func
are handled hackily with -init_offsets
#97155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-lld-macho Author: Daniel Bertalan (BertalanD)
Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in `__mod_init_func` (an array of pointers *rebased* through the usual means at runtime) to `__init_offsets` (an array of 32-bit offsets to initializers).
Normally, we only need to care about the relocations in the input
Fix ideas
These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors. ; test.s
.globl _main
.text
_main:
leaq _init_slot(%rip), %rax
.section __DATA,__mod_init_func,mod_init_funcs
_init_slot:
.quad _main <details>
</details> <details>
</details> Additional questionsThere have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there? |
When `-fixup_chains`/`-init_offsets` is used, a different section, `__init_offsets` is synthesized from `__mod_init_func`. If there are any symbols defined inside `__mod_init_func`, they are added to the symbol table unconditionally while processing the input files. Later, when querying these symbols' addresses (when constructing the symtab or exports trie), we crash with a null deref, as there is no output section assigned to them. Just making the symbols point to `__init_offsets` is a bad idea, as the new section stores 32-bit integers instead of 64-bit pointers; accessing the symbols would not do what the programmer intended. We should entirely omit them from the output. This is what ld64 and ld-prime do. This patch uses the same mechanism as dead-stripping to mark these symbols as not needed in the output. There might be nicer fixes than the workaround, this is discussed in llvm#97155. Fixes llvm#79894 (comment) Fixes llvm#94716
Nico mentioned to me privately that maybe we could patch upstream projects to not put symbols inside
|
…97156) When `-fixup_chains`/`-init_offsets` is used, a different section, `__init_offsets` is synthesized from `__mod_init_func`. If there are any symbols defined inside `__mod_init_func`, they are added to the symbol table unconditionally while processing the input files. Later, when querying these symbols' addresses (when constructing the symtab or exports trie), we crash with a null deref, as there is no output section assigned to them. Just making the symbols point to `__init_offsets` is a bad idea, as the new section stores 32-bit integers instead of 64-bit pointers; accessing the symbols would not do what the programmer intended. We should entirely omit them from the output. This is what ld64 and ld-prime do. This patch uses the same mechanism as dead-stripping to mark these symbols as not needed in the output. There might be nicer fixes than the workaround, this is discussed in #97155. Fixes #79894 (comment) Fixes #94716
Background: when linking macOS binaries with chained fixups, we need to transform initializers stored in
__mod_init_func
(an array of pointers rebased through the usual means at runtime) to__init_offsets
(an array of 32-bit offsets to initializers).Normally, we only need to care about the relocations in the input
__mod_init_func
sections. A problem arises when there are also symbols defined inside it. We currently ignore them in LLD -- that is, we don't add them to the symbol table (since the location they point to don't exist anymore). This doesn't happen in regular binaries created by Clang,swiftc
,rustc
, etc., but there have been a few instances, where this led to crashes:In #94716, we see a go-generated binary (the repro file is broken and doesn't include a bunch of swift stuff from the SDK -- TODO!). Here, the symbol (
__rt0_arm64_ios_lib.ptr
) is defined inside__mod_init_func
as a non-exported symbol; we crash when trying to add it to the symbol table (it has no corresponding output section, so we can't setn_sect
).Backtrace excerpt:
This Chromium bug is related to the
curl
Rust crate, which deliberately defines a symbol among the initializers, apparently, to sidestep an old linker/compiler dead-stripping issue. Here, the symbol (__RNvCsiLjxBhyzEAX_4curl9INIT_CTOR
;curl::INIT_CTOR
) is externally visible, we crash when we try to query its address when adding it to the exports trie.Backtrace excerpt:
Fix ideas
Completely remove
__mod_init_func
from the list of input sectionsI thought my original patch would have this effect: we do not create an OutputSection for it and don't even include it in the global
inputSections
list.In reality, this is not enough; they are still added to the symbol table (as
__mod_init_func
is present in thesymbols
array,ObjFile::parseSymbols
will reach it).(+) if we encounter a
Defined
symbol during the program's execution, we know for sure that it has an address, no need to check for a poison flag.(-) sounds a bit hackish; currently there is a one-on-one correspondence between
ObjFile::sections
and the input file's contentsCreate a "poisoned" state for
Defined
Symbols(+) Least amount of modification for the existing code
(+) There will still be an entry (though poisoned) in the symbol table, so we'll be able to emit useful warnings if someone actually refers to the symbol.
NOTE: this is basically the current workaround I ended up going for, except that I use the
isLive()
mechanism from dead-stripping.Some other idea?
These mentioned workarounds only work if the symbols are not actually referenced in relocations. If they are, we get different (but equally undesirable) behaviors.
ld64 crash
ld_prime broken (?) binary
Additional questions
There have been other similar transformations added recently: ObjC relative method lists, (etc?). Could we theoretically encounter a similar scenario there?
The text was updated successfully, but these errors were encountered: