🐞 Fix: deploying opsman to vSphere 15% boot fail #643
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When deploying opsman to vSphere, it fails to boot 15% of the time. It happens very early in the boot process, apparently even before loading the kernel. When viewing the opsman's VM's console, the symptom is a flashing cursor in the upper left hand side of the screen.
This commit fixes that failure by waiting 80 seconds for the opsman VM to report its IP address to vCenter, and if it hasn't reported its IP address by then, it sends a hardware reset to the VM. An opsman VM typically reports its IP address to vCenter 43 seconds after being powered-on.
We verified this fix by successfully deploying & booting opsman 146 times in a row.
More about the boot failure:
This fix should have negligible impact on the length of time to deploy opsman.
Typical output when resetting a failed initial boot:
The added tests are admittedly lackluster, but I couldn't find a way to implement them without making
vsphere.go
overly-complicated.