-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[BOLT] GOT array pointer incorrectly rewritten #100096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-bolt Author: Peter Waller (peterwaller-arm)
# Problem
Static binary glibc startup crash messageThis manifests in glibc static binaries on aarch64-linux with the binary crashing on startup with the error message It is currently known to manifest on aarch64-linux with glibc, but underlying issue may not be unique to that target or scenario, it may be at risk of happening elsewhere. Test case (copy paste whole block, produces ‘./testcase’)
|
StringRef GOTContents = cantFail(GOTSection.getContents()); | |
for (const uint64_t *GOTEntry = | |
reinterpret_cast<const uint64_t *>(GOTContents.data()); | |
GOTEntry < reinterpret_cast<const uint64_t *>(GOTContents.data() + | |
GOTContents.size()); | |
++GOTEntry) { | |
if (uint64_t NewAddress = getNewFunctionAddress(*GOTEntry)) { |
Thinking about solutions
- The case of data and code sharing an address is an awkward edge case. It happens here because it’s valid to have a pointer to ‘one past the end of an array’ (per C standard “Additive operators” regarding integer addition to a pointer), and that pointer can end up being referenced through the GOT.
- This edge case is rare but is at last known to create an issue with glibc static binaries. There could be other binaries in the wild with problems that may be revealed at runtime.
- There is an immediate problem of making glibc static binaries work.
- @paschalis-mpeis attempted a fix in [WORKAROUND][BOLT][AArch64] Static binary patching for ELF. #97710, but the fix is incomplete in that it does not handle the case in general. I’m not sure about the rationale of the condition used or whether it is valid.
- We may want to special case somehow this if it leads to a clean immediate solution, with an approach similar to that presented by @paschalis-mpeis above, though the exact conditions are subject for discussion.
- If the type referred to by a GOT entry was known,
patchELFGOT
could query the type and use conditionally use the correct getNewFunctionAddress / getBinaryDataAtAddress in each case. - What does distinguish
array_end
and_start
is that the symbols belong to different sections (even though they share an address), and the sections are of different types. If it were possible to determine the symbol referenced in the GOT then patchELFGOT could straightforwardly avoid patching entries which reference data symbols (or patch them if it's moving them). - I am not currently aware of any way to determine the type of the data referred to by a GOT entry, other than to look for relocations which point to the entry [discussion below].
Can the symbol referenced by a GOT entry be reconstructed?
So far as I’m currently aware we don’t have a straightforward way to determine which symbol a GOT entry points to. If only the address contained within the GOT entry is considered, this does not distinguish _start
and array_end
as in the reproducer provided above.
Determining which relocations point to a given GOT entry could require determining register values for some programs, since we don’t get a fully-qualified pointer to the GOT from a GOT relocation for a specific symbol alone. Consider that you could have a register value holding a base pointer to a GOT page, and that register value gets reused for multiple different GOT references within the page.
The base pointer could be hoisted out of a loop and would not itself contain information connecting it to the symbol of interest, so determining the GOT entry being referenced by a relocation would require some kind of symbolic execution to determine the base register value in the worst case.
cc @aaupov @maksfb for BOLT
cc @MaskRay because I think you might be interested and knowledgable, particularly if there is some information BOLT can use to differentiate symbols which share the same address in the GOT straightforwardly, or if there are alternative approaches to consider in light of additional linker knowledge.
What's worse is that during we add new objects in discoverFileObjects() - registerName() we're connecting them to section using their address and thus connecting them to a wrong section in such cases. @maksfb @rafaelauler I think we need to use |
This patch aborts BOLT exeuction if it finds out-of-section (section end) symbol in GOT table. In order to handle such situation properly in future we would need to have arch-dependent way to analise relocations or its sequences, e.g. for ARM it would probably be ADRP + LDR analyzation in order to get GOT entry address. Currently it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway it seems to be quite rare case which seems to be only? related to static binaries. For the most part it seems that it should be handled on linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols which elliminates the problem. Anyway in order to achive detection of such cases this patch fixes few things in BOLT: 1. For the end symbols we're now using section provided by ELF binary, previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration, we would only would add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top level symbols if they are located in zero-sized section. For the most part such things could only be found in tests, but I don't see a reason not to handle such cases. The test was provided by peterwaller-arm (thank you) in llvm#100096 and slightly modified by me.
Hello @peterwaller-arm ! I've implemented not fix, but check and abort in such situations in BOLT in #100801. Please take a look to a commit message that describes my thoughts about handling such situations in BOLT levels. TL'DR it would be good, but currently too expensive and also I think we need to blame linker that left GOT table in fully static binary, it seems that if you would use LLD the problem would be auto eliminated :) |
This patch aborts BOLT exeuction if it finds out-of-section (section end) symbol in GOT table. In order to handle such situation properly in future we would need to have arch-dependent way to analise relocations or its sequences, e.g. for ARM it would probably be ADRP + LDR analyzation in order to get GOT entry address. Currently it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway it seems to be quite rare case which seems to be only? related to static binaries. For the most part it seems that it should be handled on linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols which elliminates the problem. Anyway in order to achive detection of such cases this patch fixes few things in BOLT: 1. For the end symbols we're now using section provided by ELF binary, previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration, we would only would add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top level symbols if they are located in zero-sized section. For the most part such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirment since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in llvm#100096 and slightly modified by me.
This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in llvm#100096 and slightly modified by me.
This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in llvm#100096 and slightly modified by me.
This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in llvm#100096 and slightly modified by me.
This patch aborts BOLT execution if it finds out-of-section (section end) symbol in GOT table. In order to handle such situations properly in future, we would need to have an arch-dependent way to analyze relocations or its sequences, e.g., for ARM it would probably be ADRP + LDR analysis in order to get GOT entry address. Currently, it is also challenging because GOT-related relocation symbols are replaced to __BOLT_got_zero. Anyway, it seems to be quite a rare case, which seems to be only? related to static binaries. For the most part, it seems that it should be handled on the linker stage, since static binary should not have GOT table at all. LLD linker with relaxations enabled would replace instruction addresses from GOT directly to target symbols, which eliminates the problem. Anyway, in order to achieve detection of such cases, this patch fixes a few things in BOLT: 1. For the end symbols, we're now using the section provided by ELF binary. Previously it would be tied with a wrong section found by symbol address. 2. The end symbols would have limited registration we would only add them in name->data GlobalSymbols map, since using address->data BinaryDataMap map would likely be impossible due to address duality of such symbols. 3. The outdated BD->getSection (currently returning refence, not pointer) check in postProcessSymbolTable is replaced by getSize check in order to allow zero-sized top-level symbols if they are located in zero-sized sections. For the most part, such things could only be found in tests, but I don't see a reason not to handle such cases. 4. Updated section-end-sym test and removed x86_64 requirement since there is no reason for this (tested on aarch64 linux) The test was provided by peterwaller-arm (thank you) in #100096 and slightly modified by me.
When patching statically linked binaries, avoid patching GOT entries that did not belong to the original text section and had an alias. One such special case is the '_init' function that belongs to the '.init' section. It has '.init' as an alias, which points to the same address of '_init' in the original binary. This was observed with GNU linker. BOLT normally rejects these cases. See issue: #100096
The previous commit #100801 (comment) got reverted, which would abort in this case. In the meantime I've proposed a simple fix for the specific case of glibc static binaries up in #117751; and users can use The general case would remain unsolved but given that it requires a data symbol to share an address with a function, I think unlikely to hit in most standard use cases. |
_init is used during startup of dynamic binaires. Unfortunately, its address can be shared (at least on AArch64 glibc binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. We could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates llvm#100096.
_init is used during startup of binaires. Unfortunately, its address can be shared (at least on AArch64 glibc static binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. For now, skip _init for static binaries on any architecture; we could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates llvm#100096.
_init is used during startup of binaires. Unfortunately, its address can be shared (at least on AArch64 glibc static binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. For now, skip _init for static binaries on any architecture; we could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates llvm#100096.
_init is used during startup of binaires. Unfortunately, its address can be shared (at least on AArch64 glibc static binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. For now, skip _init for static binaries on any architecture; we could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates llvm#100096.
_init is used during startup of binaires. Unfortunately, its address can be shared (at least on AArch64 glibc static binaries) with a data reference that lives in the GOT. The GOT rewriting is currently unable to distinguish between data addresses and function addresses. This leads to the data address being incorrectly rewritten, causing a crash on startup of the binary: Unexpected reloc type in static binary. To avoid this, don't consider _init for being moved, by skipping it. ~We could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this.~ (Edit: this broke the test bolt/test/runtime/X86/retpoline-synthetic.test, because _init was skipped from the retpoline pass and it has an indirect call in it, so I include a check for static binaries now, which avoids the test failure, but perhaps this could/should be narrowed further?) For now, skip _init for static binaries on any architecture; we could add further conditions to narrow the skipped case for known crashes, but as a straw man I thought it'd be best to keep the condition as simple as possible and see if there any objections to this. Updates #100096.
For now I think this can be considered closed by #117751, so long as there are not other examples of this found in the wild. |
Problem
patchELFGOT
function currently assumes values in the GOT are pointers-to-functions, and if the function moves, the address is updated.The problem is present on recent (at time of writing) main commit 5f05d5e.
Static binary glibc startup crash message
This manifests in glibc static binaries on aarch64-linux with the binary crashing on startup with the error message
Unexpected reloc type in static binary
. What’s happening is that the glibc startup code iterates over an array of reloc, but it runs off the end of the array because the array end pointer which lives in the GOT is no longer valid.It is currently known to manifest on aarch64-linux with glibc, but underlying issue may not be unique to that target or scenario, it may be at risk of happening elsewhere.
Test case (copy paste whole block, produces ‘./testcase’)
objdump --disassemble-all --reloc testcase
output:What breaks
Note above there are GOT entries
1ffe8
(array_start
),1fff0
(array_end
) and1fff8
(_start
), and that the address ofarray_end
shares its address with_start
(equal to00001004
).Running
llvm-bolt -o testcase.bolt testcase
, BOLT erroneously rewrites both these got entries, the new GOT contains:This is incorrect because
1fff0
has been rewritten to point to the new start, but the array data has not moved.Code which is doing
for (void *i = array_start; i != array_end; i++)
will now run off the end of the array.Why does it break
The culprit is the code in
patchELFGOT
which currently assumes that all pointers in the GOT may be interpreted as function pointers:llvm-project/bolt/lib/Rewrite/RewriteInstance.cpp
Lines 5271 to 5277 in 5f05d5e
Thinking about solutions
patchELFGOT
could query the type and use conditionally use the correct getNewFunctionAddress / getBinaryDataAtAddress in each case.array_end
and_start
is that the symbols belong to different sections (even though they share an address), and the sections are of different types. If it were possible to determine the symbol referenced in the GOT then patchELFGOT could straightforwardly avoid patching entries which reference data symbols (or patch them if it's moving them).Can the symbol referenced by a GOT entry be reconstructed?
So far as I’m currently aware we don’t have a straightforward way to determine which symbol a GOT entry points to. If only the address contained within the GOT entry is considered, this does not distinguish
_start
andarray_end
as in the reproducer provided above.Determining which relocations point to a given GOT entry could require determining register values for some programs, since we don’t get a fully-qualified pointer to the GOT from a GOT relocation for a specific symbol alone. Consider that you could have a register value holding a base pointer to a GOT page, and that register value gets reused for multiple different GOT references within the page.
The base pointer could be hoisted out of a loop and
would not itselfrelocations initializing it would not contain information connecting it to the symbol of interest, so determining the GOT entry being referenced by a relocation would require some kind of symbolic execution to determine the base register value in the worst case.cc @aaupov @maksfb for BOLT
cc @MaskRay because I think you might be interested and knowledgable, particularly if there is some information BOLT can use to differentiate GOT entries containing the same address but referencing different symbols straightforwardly, or if there are alternative approaches to consider in light of additional linker knowledge.
The text was updated successfully, but these errors were encountered: