-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: canonical 48-bit address bug in lfstack #49405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Only a partial fix for FEX-Emu#1330, still needs preemption disabled to work. On x86-64 hosts the Linux kernel resides in the top bit of VA which isn't mapped in to userspace. This means that userspace will never receive pointers living with that top bit set unless you're running a 57bit VA host. This results in userspace pointers never needing to do the sign extending pointer canonicalization. But additionally some applications actually don't understand the pointer canonicalization. This results in bugs like: golang/go#49405 Now if you're running on a 57bit VA host, this will end up behaving like FEX but it seems like no one in golang land has really messed with 57bit VA yet. In AArch64, when configured with a 48bit VA, the userspace gets the full 48bit VA space and on EL mode switch has the full address range change to the kernel's 48bit VA. This means that we will /very/ likely allocate pointers in the high 48bit space since Linux currently allocates top-down. So behave more like x86-64, hide the top 128TB of memory space from the guest before boot. Testing: Took the M1Max 15ms to 21ms allocate the top 128TB.
Thanks. Note that this code is deeply system dependent. Can the problem you mention actually happen on any existing system? If not, it's not worth worrying about. |
Currently the only real concern is if anyone tries running on Intel's latest server platforms with LA57 enabled. |
This bug hit FreeBSD, where it caused all Go packages to fail. We have spent a long time without any Go packages being available. We found this issue yesterday and by working around it we have packages again. So, it's not theoretical. |
I should mention: our workaround required changing a sysctl, so it's not something we can account for in the ports tree. This will affect end-users too. |
Can you explain what your addresses look like that are triggering this assert? |
@randall77 Yeah, this occurred on amd64 builders (jailed on an amd64 host). You asked a good question, but I don't know how to answer that. How can I get you that information? |
Hi @randall77
Full build log is here: https://people.freebsd.org/~dbaio/go/go121-gohan06-logs.txt I can run some tests. Just guide me, please. Details about the CPU:
When building on another box (same FreeBSD version) that doesn't have LA57, it works just fine. |
The obvious experiment is to change this line go/src/runtime/tagptr_64bit.go Line 29 in 47ab9cb
It is reducing the tag space from 19 to 10 bits. I don't think that's a showstopper (some of the other archs have that number) but it is worrying, as those tags are there to defend against the ABA ambiguity in our lock-free linked list. A tag collision that results in not detecting an ABA situation would be really bad. |
Is there any way to mark a binary to tell the OS that we don't want LA57? |
On FreeBSD it is possible with |
Just to clarify: on Linux, I believe mmap will only return an address above |
Correct, if LA57 is enabled, then mmap(NULL, ...) can return addresses above |
Does Can we just do that? Probably just add that constant to the writes on both sides of this go/src/cmd/link/internal/ld/elf.go Line 798 in 80bff42
|
Yes, you could just set the bit there. Note that this bit is amd64-only. We have not implemented a control for RISC-V. Are you doing anything comparable to disable LA57 on other OS's such as Linux currently? |
That's fine.
We are currently assuming 56 bit (userspace) pointers for riscv. See #54104.
I don't believe so. Linux is much better about never breaking existing binaries, so any la57 support would require explicit opt-in by the binary. |
The one thing I'm not sure about is how to handle external linking. Is there a command-line flag to the linker to set this bit? |
Just to make sure I understand correctly, this is only the case for x86 (i.e., #54104 is the same issue on Linux on RISC-V)?
There is no cmdline flag. These flags are just bits in an |
Yes, this bug is only about x86. Go supports 56-bit pointers on riscv, since the fix for #54104 went in a few years ago.
I believe so. I'm not sure the exact search path. I'm sure you can override it with some environment variables. |
Or if go developers insist on requiring LA48 on amd64, something like this might work:
|
Yes, that is what I was suggesting here. |
Does go link the system C startup objects when using the system linker? The system linker doesn't have any knowledge of the note, it just passes through from the csu bits. |
But then the build already rely on the execution of the external program (ld), and can rely on the elfctl presence and usability as well. |
Change https://go.dev/cl/665475 mentions this issue: |
confirmed: before patch:
after patch:
|
Change https://go.dev/cl/665815 mentions this issue: |
I found a few extra bits in the cushions of my couch, so we may be ok here. (See CL above.) If that CL works out, then we can just bump to 57 address bits on freebsd. The only hesitation I have is that that CL is nontrivial enough that backporting is dicey. If we want to backport something here (and I suspect we do not, but I'm willing to hear arguments), the la48 bit CL might be safer. |
I am still curious why it is fine to use 56/57 bits on RISC-V and PPC AIX, while the tags are considered too short on LA57 amd64. |
Now that Go downloads and builds toolchains as needed, FreeBSD is strongly considering providing only the most recent release, rather than multiple versions. Not backporting the CL is really all the justification I'd need to drop the old versions, so I wouldn't do any difficult backporting work if it were just for FreeBSD's benefit. |
Different levels of risk tolerance? We also give second-class ports a fair amount of leeway in deciding what is best for them. That said, I think I would have pushed back on those changes had I seen them. Water under the bridge. More generally, though, I don't think anyone has an idea about how big a tag space would be "safe". I have the feeling that we probably want more bits for large (in the # of processors sense) machines. Those machines tend to be amd64/arm64. But even there I'm not relying on any concrete arguments, just intuition. |
A backport would be available earlier than the 1.25 release. (In a few weeks, say, versus a few months for 1.25.) But point taken, thanks. |
Ahh ok, well I was pretty optimistic there. It would of course be nice to get this into the ports tree sooner rather than later so we can stop universally limiting pointer width on the builders (with all of its downstream effects). So, if it's possible to backport to 1.24, it'd make a difference for us. But we also understand that sometimes dicey is dicey. |
Currently we assume alignment to 8 bytes, so we can steal the low 3 bits. This CL assumes alignment to 512 bytes, so we can steal the low 9 bits. That's 6 extra bits! Aligning to 512 bytes wastes a bit of space but it is not egregious. Most of the objects that we make tagged pointers to are pretty big. Update #49405 Change-Id: I66fc7784ac1be5f12f285de1d7851d5a6871fb75 Reviewed-on: https://go-review.googlesource.com/c/go/+/665815 Reviewed-by: Keith Randall <[email protected]> Reviewed-by: Michael Knyszek <[email protected]> Auto-Submit: Keith Randall <[email protected]> LUCI-TryBot-Result: Go LUCI <[email protected]>
What version of Go are you using (
go version
)?go version go1.16.2 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
If lfstack gets any pointer on amd64 platforms where the top bit is set then an assert is thrown.
In most cases this won't occur since the kernel will consume the top bit of VA, leaving the userspace with 47bits of VA.
This is a logic bug around address canonicalization.
Check is here: https://github.com/golang/go/blob/master/src/runtime/lfstack.go#L28
Pointer sign extension is here: https://github.com/golang/go/blob/master/src/runtime/lfstack_64bit.go#L52
I don't have a basic application or environment for reproducing this. Easy to see the logic bug.
What did you expect to see?
While it is correct that you must canonicalize the pointer by sign extending it on 48bit systems (I'm not even going to bring up LA57). The check inside of lfstack is incorrect since it assumes the sign extension will never occur. Thus assuming it will never receive a true 48-bit pointer.
This assumption is broken in environments where you get 48-bit pointers with the high bit set.
This is purely a simple logic bug, it shouldn't assert in this case.
What did you see instead?
An assert on 48-bit address with high bit set.
The text was updated successfully, but these errors were encountered: