-
Notifications
You must be signed in to change notification settings - Fork 176
choco install
fails with Exit code was '-2147023829'
on a Windows 11 host with process isolation
#216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's an access violation (0xc0000005), which is a super-weird thing to happen, and certainly shouldn't be caused by any of the things done to create the container... If you run the container image as it stood before that step ( If so, then something done earlier in the image build has corrupted the system weirdly, but since all that has been done is copy some DLLs, enable long paths, and install chocolately, I can't see what it would be. If not, then something in how ue4-docker is starting the container must be different from when it's run by hand. Was your stand-alone test also using process isolation, by the way? It's not the default for |
Hi, thanks for looking into this, you are right, I must have not specified isolation during my initial test. I have now repeated some testing, and it appears that the issue occurs on both server core and prerequisites containers when running in process isolation, but not with the default isolation behaviour. Test1: Existing prerequisites containerI create a container from the image as it stood before the last step using the following command:
Test 3: server core with default isolationCreating the same image, but without specifying process isolation:
Test4: reinstalling git on process isolation windows core containerConnecting back to the process isolation container with windows core, I then started to investigate how the installer fails. I ran the following command to pull a fresh copy of the installer:
Log appears to show a successful install, i will repeat this process on a fresh image and see if I can get it to fall over. |
Ok, manually downloading and installing git on a fresh windows core container with process isolation appears to work fine. The install completes without any errors. Looking at the containers where installs fail, they still end up with working copies of Git installed, so something must be causing the installer to fail late in the install process. As this is happening both on server core and prerequisites containers when installing with chocolatey, but succeeds when installed manually on server core, I would assume the cause of the issue is likely to do with the chocolatey installer package. Test5: install git on new server core image with process isolation
Log file: |
I'm suspicious of that "success" even in the Hyper-V isolation mode, due to
which comes from the second-last line of this snippet from the install script, where the first line in the snippet is the failing line in process isolation: Install-ChocolateyInstallPackage @packageArgs
Get-ChildItem $toolsPath\$fileName32, $toolsPath\$fileName64 | ForEach-Object { Remove-Item $_ -ea 0; if (Test-Path $_) { Set-Content "$_.ignore" '' } }
$packageName = $packageArgs.packageName
$installLocation = Get-AppInstallLocation $packageArgs.SoftwareName
if (!$installLocation) { Write-Warning "Can't find $packageName install location"; return }
Write-Host "$packageName installed to '$installLocation'" I also noted on the Chocolatey package discussion, a few people have had issues over the years when the installer fails if doing a silent install and it wants to close some running programs. However, I super-doubt that's the case here (different error code, and what would be running??) Anyway, since this repros on Windows Server Core without ue4-docker, then I'd suggest raising it to Chocolatey (not sure what their support channels are), and/or perhaps the Windows Containers issue tracker since this seems to be affected by the isolation mode, suggesting it's a container issue at its core. To split the difference in the possible problems, try repeating Test 5, but run
before downloading and running the git installer manually in the same container instance. That will help identify if it's something about Chocolatey itself, or something about the git package specifically. Another possible test you might want to try in the meantime, is using a |
Thanks for the help. I have opened a ticket with chocolatey relating to this issue here: On that issue I have built a dockerfile that installs chocolatey, then choco git as a separate RUN line. This errors out with process isolation, and installs otherwise. Weirdly the git installer seems to produce some error text when run in a RUN statement from a dockerfile. It complains about the extra commas in /Components, and return 1. I see none of this when running the commands interactively. The error text is displayed on either isolation mode. |
Hmm... I wonder which part (Windows, Chocolatey, Git, Docker?) has changed. I successfully built ltsc2022 back in September when was working on #199. One major difference I see is that I was using Windows Server 2022 as a host instead of Windows 11, but this might be irrelevant. Maybe you could to try older Git versions? |
I can probably rule out Git, I have tried every version since 2.31.1, and get the same result on each. This was done with the latest choco, and git releases. Interestingly the same git.install install location error is showing, but git does get installed correctly. docker build output
|
I guess that leaves Host OS or Docker version changes as potential culprits for the behaviour (as server 2022 tests were using Microsoft docker distribution). |
So, we're trying to blame Windows 11 process isolation as the root cause of this issue? |
You can use Stevedore so you have 100% identical Docker engine on both client and server Windows. |
Ok, I have repeated the test on server 2022 using Stevedore v0.0.2, the test passes, and git installs correctly. I'm having some problems getting stevedore to install on win11, its erroring out creating a user, both through chocolatey and directly with the msi. I just loggged a bug report here: slonopotamus/stevedore#12. |
@TomSmithGR As a quick note, I would find it easier to read if you put the output text inside code-blocks (```-delimited blocks). |
TBBIe: thanks for the tip, sorry a bit new to markdown, I will correct the posts. Ok, I managed to get stevedore working, and repeated the tests. Git still fails on win11 with process isolation, even with the same docker version. It must be something different between windows 11 and server 2022 hosts when using process isolation. Results of docker build on win11 using stevedore v0.0.2 and process isolation. |
Oh dear, I hope this isn't another ABI breakage issue like we saw back in February 2020, this time caused by some update to Windows 11. Microsoft made a point of promising full ABI compatibility between existing Windows Server 2022 container images and all future versions of Windows Server 2022 and Windows 11, ensuring process isolation mode never breaks. |
I spun up a win11 VM in azure to verify this isn't machine specific, and Git will install fine in process isolation there. Both machines have identical windows builds, both using the same stevedore version, both building the same dockerfile, one succeeds, one fails. We are now really in the weeds, either its something specific to this box, like hardware fail, corruption of a file somewhere, or its something really weird like a cpu specific issue (im AMD, VM is xeon). |
I just updated to Win11 (22000) (after not being able to use process isolation on 19044) and am getting to the exact same error when trying to force ltsc2022 in process isolation:
I tried a hyperv-build for ue4-build-prerequisites, and a rerun with process isolation when that was cached locally, but that did not work:
|
So we have a success on a Win11 VM on an Intel Xeon host, and two failures on Win11 bare-metal AMD-based hosts. I was suspicious that it might be CPU-related earlier, but apparently didn't say so here. So if someone is able to take the time to reproduce the failure with an AMD-based cloud VM to contrast with @TomSmithGR 's success on an Intel-based cloud VM, then we probably have a very clear repro we can kick over to Microsoft to chase up with their own channel partners. If an AMD-based cloud host running Win11 repros the issue, I'd be interested if it also repros with the AMD host running LTSC2022, as then:
|
Azure VM D8as v4 (AMD EPYC 7763)
Annoyingly enough, that Win11 VM is the exact same version as my local machine (which fails):
It seems I don't have any local Intel machines compatible with Windows 11. |
Just so I'm clear, the Dockerfile.txt attached to #216 (comment) is the repro-case being tested now? So we get the failures independent of ue4-docker? |
I have something to add. Although git failing stops the process, its not the only problem. I created exported dockerfiles from ue4docker, then modified them to continue on the git error above. Several other dependencies also errored out during install. The log below is of a trial run. For clarity this is running on windows 11 host, Docker version 20.10.11, build dea9396, using my local threadripper machine. Trial run with exported dockerfile - process isolation
|
I'm currently testing this on a VirtualBox VM running on top of Ryzen 3700X, stay tuned. |
Okay, I've reproed this issue on VirtualBox 6.1.28 running on top of Ryzen 3700X.
We now can conclude that this issue is 100% independent of ue4-docker. ATTENTION: I am NOT reproducing the issue if I use |
choco install
fails with Exit code was '-2147023829'
on a Windows 11 host with process isolation
So, did anyone report this to Microsoft? |
I didn't, I only put a ticket in with chocolatey, and that didn't go anywhere. |
Okay, I reported it as microsoft/Windows-Containers#197 |
Looking back at this ticket, there's actually two error codes going on. The original report from the git installer was This one doesn't seem to repro on VMs:
but repro'd for two users on bare-metal:
The vcredist installers are failing with
That suggests we're seeing two different issues here, or the same issue in multiple ways, and the VM-ness of the host affects only one of those ways. The fact that this affects the process-isolation container but not the out-of-container environment points me at a kernel/userspace ABI mismatch issue between LTSC2022 and Windows 11 that is only triggered by AMD CPUs. This suggests that an LTSC2022 host would not show this problem, as the user-space would be the same between the host and the container. Looking back, I don't see a result for an with ltsc2022 host, either in a VM or bare-metal. Sadly, the one test I can see on ltsc2022 in a VM only noted that the access violation failed to repro, which is consistent with all the other (later) VM-based tests, but didn't record whether the aborted process failure occurred in that setup. To wildly speculate a little, I wonder if there's actually a bug in the LTSC2022 userspace code calling some kernel API, and it generates out-of-bounds memory access that was fine in the LTSC2022 kernel, but the Windows 11 kernel catches and kills the process in response. Given it's a hardware difference, it would have to be in a hardware-implemented like the NX bit, or a microcode spectre mitigation or similar. That might also explain why one of the issues is VM-sensitive, if the Hypervisor has prevented the CPU-level catch in one case, but not the other. |
So, Microsoft has somewhat confirmed that Win11 process isolation is broken. There's nothing we can do here, except maybe using Hyper-V isolation by default when we see a Win11 host. |
Oh, wait, we already use Hyper-V isolation on Win11 by default. |
If and when a fix is issued, we can reject versions we know are affected. I'm assuming it'll turn out to be a host-system issue; if it turns out to be a container-image issue, that'll be harder to detect without more image introspection that we currently do. |
Aaand... Let's reopen. I'm observing this on Windows 10 LTSC 2019, but only on AMD CPU. This means two things:
|
The error code observed in #230 is 0xC0000409 I'm not sure how to extract such a 'more interesting' error code though. Also, for "exactly the same VM", have security updates been applied since it last worked? |
Yes. And docker images (windowsservercore and windows) are also newer. |
Okay, so it's possible Windows patches have introduced the problem in either the container host or container image. >_< Since we're talking process isolation, that means that if this is the same issue in both Windows 11 and Server 2019, a change was made in either both Server 2019 and Windows 11, or both the Server 2019 and Server 2022 container images. In my opinion, the latter seems more likely. And should be easier to test, since old base images aren't removed from MCR and can be named by full-version tag. So if you happen to have a record of a base tag that worked previously, you could try that precise same container image against the newer host in your VM. If it fails, then the remaining validation of a host-side change requires rolling back the host version (or building a new VM, I guess) and only updating to a known-to-work-previously update. |
Copying from #230:
Current hypothesis: this issue is caused by
These tests are invalid, see next comment. |
Although this symptom is different, I'd suggest testing with both host and container on the same side of the February 11 2020 security update, which introduced some kind of ABI breakage between host and container. I know the ABI breakage definitely affected ue4-docker because some of the stuff Chocolatey installs is using 32-bit installers, and I happened to get caught by this issue trying to build newer container images on an older host. But that was on an Intel machine, so perhaps the AMD machines will report that same breakage with the error we're seeing here. So for Server 2019, that's both host and container either ≤973, or host ≥ 1039 and container ≥ 1040. |
Oh maaaan, I forgot about that thing. |
You might be able to use https://www.powershellgallery.com/packages/PSWindowsUpdate/2.2.0.2 to update to a KB from the update list. I've not tried this though. You could also try just grabbing the cumulative update package from the Microsoft Update Catalog (the link's under "How to get this update") and work out how to install that .msu file. I think you can right-click and hit install like a .msi file, but I might be wrong about that. |
New plan: I'm fully updating my 2019 host system and re-testing everything :D That Feb 2020 thing could have corrupted all of my tests I did on AMD CPU in #230. |
10.0.17763.2686 Windows 10 LTSC 2019 host (fully updated as of today) + 10.0.17763.2686 Docker image (this is the one you get by @alexgeek WRT your 20H2 issue in #230. 20H2 is going EOL on 2022-05-10, in just a month and a half. I don't think it worth trying to fix anything on it. Just let it go and switch to a more long-term options. Currently supported and functional systems with process isolation:
Technically, Windows Server 2016 and Windows 10 LTSB 2016 would also work with ltsc2016 image, but ue4-docker has dropped support for them in #187 (though you might have success with ue4-docker versions older than |
So is my only option for Windows 10 Pro to use LTSC 2019 without process isolation? So either I move to Win 11 or Windows Server? |
Yup. I followed up on #232 about this part specifically. |
I'm going to reclose this. Edit: Ooops, #230 wasn't an LTSC2019 issue, it was a Windows 10 20H2 issue. Per #216 (comment), still nothing we can really do about it at this date. |
😩 I upgraded the AMD box we have to Win 11 (didn't see the Intel note), obviously didn't work.
Which is actually from the second run, the first run failed with:
this is with this command on master (5e9fb63)
|
I faced this problem earlier today, using the The issue disappeared after I performed a windows update via |
@chaychoong You hit this issue: https://support.microsoft.com/en-us/topic/you-might-encounter-issues-when-using-windows-server-containers-with-the-february-11-2020-security-update-release-b9a8fcae-950d-7a0b-ac7c-cb6b294cb809 (and yes, installing system updates on Windows Server 2019 is a proper way to fix it) |
So, am I right, that the only viable option for Windows 10 is LTSC 2019 (aka build 1809) + ltsc2019 image? Windows 10 Versions |
Currently working (and supported by MS) process isolation combos are:
|
Output of the
ue4-docker info
command:PS D:\ue4docker> ue4-docker info
ue4-docker version: 0.0.94 (latest available version is 0.0.94)
Operating system: Windows 10 Pro (Build 22000.376)
Docker daemon version: 20.10.11
NVIDIA Docker supported: No
Maximum image size: 400GB
Available disk space: 6.7 TiB
Total system memory: 255.83 GiB physical, 293.83 GiB virtual
Number of processors: 64 physical, 128 logical
Additional details:
I am running windows 11 as a host OS.
I am building using the following command:
PS D:\ue4docker> ue4-docker build -username redacted -password redacted --exclude ddc -basetag ltsc2022 -isolation=process --visual-studio=2019 4.27.2
The build is failing to create the prerequisites image, when installing git from chocolatey.
ERROR: Running ["C:\ProgramData\chocolatey\lib\git.install\tools\Git-2.34.1-64-bit.exe" /VERYSILENT /SUPPRESSMSGBOXES /NORESTART /NOCANCEL /SP- /LOG /COMPONENTS="icons,assoc,assoc_sh,,,,gitlfs,icons\quicklaunch" /o:PathOption=Cmd /o:BashTerminalOption=ConHost /o:CRLFOption=CRLFCommitAsIs ] was not successful. Exit code was '-1073741819'. See log for possible error messages.
I tried creating a ltsc2022 windows server container, and repeated the following command:
choco install -y git --params "'/GitOnlyOnPath /NoAutoCrlf /WindowsTerminal /NoShellIntegration /NoCredentialManager'"
git installed successfully during that attempt, so this does not appear to be a chocolatey issue.
I have repeated the ue4-docker build several times, and get the same result.
ue4-docker build log.txt
chocolatey.log
The text was updated successfully, but these errors were encountered: