-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[LLD][AARCH64] Bug with ADRP + LDR transformation when there is a branch to the LDR #138254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-aarch64 Author: None (jeanPerier)
**Summary:**
The issue is how LLD replaces two ADRP + LDR pairs to the same symbol inconsistently while there is a branch somewhere in the program to one of the LDR that assumes both ADRP will end-up with the same offset in the program. That following assembly:
Is rewritten in the final executable by LLD into the following:
But now the path that jumps directly to L9 with the value from the first adrp will execute Note that the hardcoded addresses in this illustration will of course be different in different environments, what matters is that the two adrp now have different offsets ( 0x220000 vs 0x419000). The actual assembly were I discovered the issue is much more complex and was actually obtained when compiling LLVM source with gcc 9.3.1. More details below in the "Where I saw the bug" section below. I think LLD should not optimize ADRP + LDR if there is a branch in the program to the LDR. I am not expert enough in aarch64 assembly to asses the validity of GCC generating a branch to the second LDR. This was observed with LLD as old as LLD 17 and is still present in the LLVM head I tested (099a0fa) . Small full repoducer: Reproduce with the attached repro.tar.gz:
This will print the following output (the exact 4296785 number does not matter, this is an address, what matters is that it should be printed twice, while here 2228305 is printed):
You can see the bogus assembly from the executable for foo. For the record,
Where I saw the bug: This bug causes some LLVM release with assert builds with gcc (at least 9.3.0) + lld to produce a flang that will hit a weird compiler error on some Fortran programs when generation IR for WHERE statements. I am providing some details below for the record, but I do not think it matters here that I provide a full Fortran reproducer that will trigger this bug to cause an ICE in flang. The bug at hand is breaking the generated code for llvm::DenseMap insertion in flang: In repro.tar.gz attached above, you will find the assembly code before and after linking the excutable for the instantiation of the template function Both ADRP + LDR are generated to compute the DenseMap seed using the The bug is then triggered when this exact template instantiation is entered and the map needs to be grown during the insertion (this causes the code path to make the critical jump for this bug). In that case, the value is wrongly inserted into the wrong bucket because the hash used the bogus address. Later searches fail to retrieve the value map to that key because they are using the right hash seed. The flang bug is very hard to reproduce because one needs to get gcc to do the right inlining and block merging + CSE + register assignment to create that branch on the second LDR. The distance in the text between the template code and |
I think @smithp35 might have been looking at a similar issue recently with llvm outlining the ldr away from the adrp. |
Yes, the issue is #129122 that came from the outliner producing GOT generating ADRP and ADD relocations in two separate sections separated by a branch. The specific "optimisation" came from identical code folding, and was dependent on one of the sections being folded, but the other one not. I'll have to take a look at the reproducer, as this should have at least been partially mitigated by 099a0fa There were some follow up PRs: Will be worth giving a recent build a try as it looks like the same issue a GOT relative ADRP/ADD there is one outstanding issue, which is GOT relative relocations to local symbols (which are a lot rarer) #136641 There's a related issue #131660 on the outliner to tell it not to split ADRP (ADD/LDR) as while it is technically legal it is in general not a good thing to do for performance. |
Thanks for the feedback, note that I do not think the outliner is involved (although I am not familiar with it). The code from this bug that has a branch to an LDR was generated by GCC. It does not looks like LLD is doing any outlining on this. The issue seems to be purely at the ADRP + LDR replacement pattern level in LLD that likely do not detect the branch on the LDR when optimizing it to a slightly different ADRP + ADD. This ADRP+LDR to ADRP + ADD pattern seems to be tested in this LLD test, I am not sure which part of LLD is responsible for this relaxation). |
Yes I think this is a complete separate problem. This is confined to one section and doesn't involve ICF. I'll have a look to see if I can see what the problem is. |
OK. My apologies for taking so long to respond, I got too focussed on whether lld was applying the transformation correctly and completely missed that the code was branching into the middle of an ADRP, LDR pair which would obviously be at risk of breaking if it had assumed it could reuse the result of the ADRP from the previous calculation. I think this will most likely be made a compiler code-generation problem as LLD is meeting the conditions set out in the ABI for performing into the translation A static linker can't realistically detect if code has arbitrarily jumped (branches within a function don't have a relocation) into the middle of a linker relaxable sequence. We'll most likely have to write requirements for code-generators into the ABI to not permit this, or linker optimisation becomes impossible, which is most likely the wrong trade-off. IIUC this is a problem found in a flang binary which has been generated by gcc 9.3.0 , but flang itself. I'll have a word with our GNU team to make sure they are aware. It is possible to turn off lld optimisation of ADRP/LDR with --no-relax if you need a workaround. |
Thanks for the feedback! So is it safe to assume that clang/llvm aarch64 backend will never generate this kind of branch in the middle of an ADRP + LDR pattern?
That is right. I checked with several versions of gcc, and 9.3.0 seems to be the only one that produce this weird jump in the It is hard to tell from my limited case in flang that involves a lot of factors if GCC>9.3.0 are safe or not (that is, if it now explicitly avoids jumping in the middle of ADRP+LDR, or if it just gets lucky on my example because it makes different inlining/register allocation decisions now). So it is definitely worth double checking with some AARCH GCC expert here, thanks! |
It's unlikely, but I don't think we have any explicit protection against it. |
I had a quick check with the GCC team. More recent releases of GCC keep GOT loads of ADRP and LDR consecutive, so this kind of case will be unlikely. I didn't get a guarantee that there won't be branches into the middle of an ADRP/LDR sequence though, and we have an existence proof that it does happen in existing code. Thinking about this a bit more and looking at LLD's code. While we can't realistically ask linker's to check if a branch destination is in the middle of a optimisable ADRP/LDR, we should be able to check whether all ADRP/LDR to a given symbol in the section are optimisable (adjacent ADRP/LDR, and in range). If all sequences are optimised or none of them are optimised we'll be protected from this problem. A more conservative implementation could just abort the optimisation if any ADRP/LDR is either out of range or non-adjacent. LLD does have some code to check that all ADRP/LDR relocations are consecutive in the relocation table. To summarise:
ABI issue raised ARM-software/abi-aa#328 |
Summary:
The issue is how LLD replaces two ADRP + LDR pairs to the same symbol inconsistently while there is a branch somewhere in the program to one of the LDR that assumes both ADRP will end-up with the same offset in the program.
That following assembly:
Is rewritten in the final executable by LLD into the following:
But now the path that jumps directly to L9 with the value from the first adrp will execute
adrp x2, 0x220000
followed byadd x2, x2, #0x50
which ends computing the wrong address.Note that the hardcoded addresses in this illustration will of course be different in different environments, what matters is that the two adrp now have different offsets ( 0x220000 vs 0x419000).
The actual assembly were I discovered the issue is much more complex and was actually obtained when compiling LLVM source with gcc 9.3.1. More details below in the "Where I saw the bug" section below.
I think LLD should not optimize ADRP + LDR if there is a branch in the program to the LDR. I am not expert enough in aarch64 assembly to asses the validity of GCC generating a branch to the second LDR.
This was observed with LLD as old as LLD 17 and is still present in the LLVM head I tested (099a0fa) .
Small full repoducer:
Reproduce with the attached repro.tar.gz:
This will print the following output (the exact 4296785 number does not matter, this is an address, what matters is that it should be printed twice, while here 2228305 is printed):
You can see the bogus assembly from the executable for foo. For the record,
foo
from foo.s is manually modified assembly to get the branch on the second LDR for the following C code:Where I saw the bug:
This bug causes some LLVM release with assert builds with gcc (at least 9.3.0) + lld to produce a flang that will hit a weird compiler error on some Fortran programs when generation IR for WHERE statements. I am providing some details below for the record, but I do not think it matters here that I provide a full Fortran reproducer that will trigger this bug to cause an ICE in flang.
The bug at hand is breaking the generated code for llvm::DenseMap insertion in flang:
In repro.tar.gz attached above, you will find the assembly code before and after linking the excutable for the instantiation of the template function
void mlir::IRMapping::map<llvm::MutableArrayRef<mlir::BlockArgument>, mlir::ValueRange&, (void*)0>(llvm::MutableArrayRef<mlir::BlockArgument>&&, mlir::ValueRange&)
(a.k.a_ZN4mlir9IRMapping3mapIN4llvm15MutableArrayRefINS_13BlockArgumentEEERNS_10ValueRangeELPv0EEEvOT_OT0_
) that is declared here in MLIR source and instantiated here in flang source.Both ADRP + LDR are generated to compute the DenseMap seed using the
llvm::install_fatal_error_handler
function address here.The bug is then triggered when this exact template instantiation is entered and the map needs to be grown during the insertion (this causes the code path to make the critical jump for this bug). In that case, the value is wrongly inserted into the wrong bucket because the hash used the bogus address. Later searches fail to retrieve the value map to that key because they are using the right hash seed.
The flang bug is very hard to reproduce because one needs to get gcc to do the right inlining and block merging + CSE + register assignment to create that branch on the second LDR. The distance in the text between the template code and
llvm::install_fatal_error_handler
also needs to be big enough so that LLD keeps ADRP.The text was updated successfully, but these errors were encountered: