-
Notifications
You must be signed in to change notification settings - Fork 5k
Dotnet build hangs while building libs.tests natively using cross built dotnet sdk #101121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To add some additional context, just before the hang we see this error:
|
The issue isn't specific to ppc64le / s390 / cross-compilation. It reproduces also with building runtime tests with an x64 mono runtime on x64. cc @lambdageek |
We've done a bit more investigation on where this error is hit. We're running
which ends up calling the This creates a new "BuildHost" process running
and uses the This ends up in the Mono JIT attempting to load and initialize that type, which fails with:
causing
The Mono JIT previously had attempted to search for
Now, there's still a number of questions: First of all, why is Second, if Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case). Any suggestion on how to proceed here would be appreciated! |
@YuliiaKovalova -- would you be able to take a look at this issue and advise? |
cc: @rainersigwald |
MSBuild assemblies should basically always come from the .NET SDK, not be repackaged by tools--MSBuild logic tends to make a bunch of assumptions about where .props/targets/etc files are relative to the loaded MSBuild DLLs, and it's also best to load the exact SDK that would be involved in
So yes. That's what MSBuildLocator does: it adds some
The latter behavior sounds very interesting. On .NET Framework and coreclr, the load is triggered by JIT compilation of the first method containing a type from If Mono is more aggressive about trying to load the type (say when the class containing such a method is first seen, rather than the method itself), that might explain the problem |
Thanks for the details! This does indeed look very similar to problems we've had in the past. Here's a PR from a while back describing one of these problems, where some assembly was also supposed to the found via an event hook, but when using Mono the JIT already was trying (and failing) to load a type defined in that assembly before that hook was even installed: #60550 (comment) In this case, the JIT also aborted when building a routine that merely refers to the class that contains an element of that type. |
Do you think there is a way to rewrite the msbuild code so it avoids accessing even the containing class type before the handler is set up? Or can you suggest any other way to resolve this issue? |
No, I don't think there's a way to rewrite MSBuild or MSBuildLocator here, unfortunately. What we need is:
It might be possible to change |
Hmm, looks like there is another twist - we're dealing with two distinct dotnet processes here. The main
where the first line calls
and the second line calls
and that last line starts a new process (as described in my comment above). So we're calling the |
ah, that is very interesting-- @jasonmalinowski does this ring any bells for you? |
With dotnet/roslyn#70469, MSBuildWorkspace does all its MSBuild operations in the separate process; it'll no longer load (or even use) MSBuild in-process. So based upon @uweigand's comment, unless something else is using MSBuild in the main process, the RegisterDefaults() line could just be removed (and potentially packages removed too.) The other process itself uses MSBuildLocator to locate MSBuild and then use it. As far as ringing a bell, I had a similar gotcha at one point regarding mono with a scary-sounding comment here: Fundamentally Mono's behavior for when the JIT needs a type to be present is different than our other runtimes, and we've been bitten before specifically around in this code since we need to ensure its registered early enough before we use types, but for Mono that's even a bit earlier than otherwise. So if there's a change that needs to be made to Roslyn I wouldn't be surprised (although I'm surprised we haven't heard another complaint about this.) There is also likely a second bug here too, somewhere: even if that process is crashing catastrophically, MSBuildWorkspace should still be throwing exceptions to let the caller know something went bad. If that's not happening, that's also bad. If that is happening but the caller isn't itself ready for something like that, then the caller should also be fixed. |
Ah, right! That does make the problem look indeed like just another instance of #60550 (comment) ... looking at
we see that the routine calls With Mono, that type seems to be referenced already when the Maybe one workaround might be to move the switch statement setting up the appropriate loader per language into some subroutine? Or else move the |
Yep, that's what we need to do. That same pattern is already done other places in the code:
The caller in this case is actually on the other side of the RPC channel, so that's not so easy. @uweigand This change isn't something I'll be able to get to as I'm currently looking at a few other things -- is this a change you're comfortable making (and testing) or do you need somebody else on our side to do it? |
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make sure the Microsoft.Build.dll is accessible, but requires types from that assembly within the routine itself. This may not always work with Mono, as the JIT may look up the types during compilation. Fixed by splitting out a LoadProjectFileAsyncCore routine, similarly to what is already done for GetProjectsInSolution. Fixes dotnet/runtime#101121.
Ah, right! This is exactly what we need to do here as well.
I've now opened dotnet/roslyn#74392, which fixes the problem for me. However, in addition to getting this upstream I guess we'll need to make sure this actually gets used by the runtime tests. As far as I can see, this would mean creating a new Microsoft.CodeAnalysis.Workspaces.MSBuild package version (maybe backporting the fix to the 4.11 branch?), pulling that new package into microsoft.dotnet.hotreload.utils.generator.buildtool (which seems to be an internal package?), creating a new version of that package and pulling that into the runtime. Am I missing anything here? |
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make sure the Microsoft.Build.dll is accessible, but requires types from that assembly within the routine itself. This may not always work with Mono, as the JIT may look up the types during compilation. Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly to what is already done for GetProjectsInSolution. Fixes dotnet/runtime#101121.
I am not. @lambdageek might be. |
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make sure the Microsoft.Build.dll is accessible, but requires types from that assembly within the routine itself. This may not always work with Mono, as the JIT may look up the types during compilation. Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly to what is already done for GetProjectsInSolution. Fixes dotnet/runtime#101121.
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make sure the Microsoft.Build.dll is accessible, but requires types from that assembly within the routine itself. This may not always work with Mono, as the JIT may look up the types during compilation. Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly to what is already done for GetProjectsInSolution. Fixes dotnet/runtime#101121.
The LoadProjectFileAsync routine calls EnsureMSBuildLoaded to make sure the Microsoft.Build.dll is accessible, but requires types from that assembly within the routine itself. This may not always work with Mono, as the JIT may look up the types during compilation. Fixed by splitting out a LoadProjectFileCoreAsync routine, similarly to what is already done for GetProjectsInSolution. Fixes dotnet/runtime#101121.
While the roslyn patch has now been merged (thanks!), this issue should not be closed quite yet. The new assembly still needs to be picked up by microsoft.dotnet.hotreload.utils.generator.buildtool and then the runtime, see above. Can someone with the necessary authority please re-open this issue? |
@tmat Do you know if microsoft.dotnet.hotreload.utils.generator.buildtool will pick up the changes automatically or does something further need to be done? @uweigand Practically if there's no further work to do in this repo, it'd be best to keep this bug closed and have another bug tracking picking up the change if needed. But do we have explicit work to do somewhere or will this flow with the usual darc flows? |
I have no idea what this tool is/does. |
So here's what I can see:
So it looks like the main problem is that the fix was merged to roslyn mainline, but not the 17.11 branch (and it doesn't look like there's any automatic merging here). This means in order to fix this issue for the runtime tests, we'd have to either backport the roslyn fix to 17.11, or else update hotreload-utils to pull in a more recent roslyn version (not sure what other effects this might have). |
@akoeplinger or @lambdageek can you help us with the flow question here? We've made a fix in Roslyn but not sure what branching is all happening here. |
This comment was marked as outdated.
This comment was marked as outdated.
Actually @jasonmalinowski does the roslyn->hotreload-utils subscription look right? do we need hotreload-utils |
@lambdageek I have no idea how any of this works. @dotnet/roslyn-infrastructure or @jaredpar maybe you can help? |
Roslyn |
We have just updated the hotreload-utils subscriptions:
and
|
Looks like we already got rid of it earlier dotnet/hotreload-utils#379 |
It's used for running hot reload unit tests in the dotnet/runtime repo against mono and coreclr via the ApplyUpdate API. The testsuite in dotnet/runtime is primarily a smoketest to notice regression faster (more extensive testing goes through the debugger). |
Thanks! I now see that hotreload-utils (both |
The dotnet-maestro bot will make the PRs. There should be no more manual interventions (other than waiting for green CI):
|
Description
Dotnet build hangs while building libs.tests natively using cross built dotnet sdk.
Brief Analysis and probable root cause:
Complete mono trace log for Microsoft.DotNet.HotReload.Utils.Generator namespace:
Reproduction Steps
Expected behavior
The dotnet native build should not hang
Actual behavior
The dotnet native build hangs.
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
No response
The text was updated successfully, but these errors were encountered: