-
Notifications
You must be signed in to change notification settings - Fork 1.9k
templates/openshift: grab bootstrap log on failure #2581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This uses the Terraform state to discover the IP address of the bootstrap node (ideally, the installer will provide this information in a form which easier to consume in the future). It then connects to the gatewayd instance on that machine and pulls the logs for various services. Hopefully, these logs will be useful when diagnosing installation failures.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This requires that |
@@ -373,8 +373,27 @@ objects: | |||
exit 1 | |||
fi | |||
|
|||
/bin/openshift-install --dir=/tmp/artifacts/installer create cluster & | |||
wait "$!" | |||
if ! /bin/openshift-install --dir=/tmp/artifacts/installer create cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping this as one of the steps in teardown
where we gather artifacts... ?
https://github.com/openshift/release/pull/2581/files#diff-6a0349f9e2bc8f6920d3661f2b81a8e4L388
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about keeping this as one of the steps in teardown where we gather artifacts...
And since we're hoping to have the installer gather these automatically at some point, we probably don't need to bother with "did we succeed?" checks there. We can just loop through the desired services and attempt to pull them. If the bootstrap node is already gone, those attempts will produce empty files, but that's not a big problem.
|
||
bootstrap_ip=$(jq \ | ||
'.modules[].resources."aws_instance.bootstrap".primary.attributes."public_ip" | select(.)' \ | ||
/tmp/artifacts/installer/terraform.tfstate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to extract this with terraform
with something like:
bootstrap_ip=$(terraform state show -state=path/to/terraform.tfstate bootstrap.aws_instance.bootstrap | grep public_ip)`
I'm guessing on the strings, I'll see if I can get a state file to figure them out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this run:
$ curl -s 'https://00e9e64bacd20f79e7cb46eb3ffb1bc4273aa1103ece928403-apidata.googleusercontent.com/download/storage/v1/b/origin-ci-test/o/logs%2Frelease-openshift-origin-installer-e2e-aws-4.0%2F3127%2Fartifacts%2Frelease-e2e-aws%2Finstaller%2Fterraform.tfstate?qk=AD5uMEsd-13YETPiOGZkf4uln_LbljtGL7JG3VZKk8XT0wU97zS18U8pRPrknNPI_ofY0W6eiunG9V6-y8Gag5SzVr9gNpmY8LRFcGMQnBVTnnzS43ZLObSZ6iNPFiDSwJTkCVqCiwAn4m5_5nmQHuQzeXXVYakUmsCm84A3iuZkiKTHmb7RyZ5oWKot26EPQPx-2vOq-sBktnZGKxzJG4R9xzrOXvcd8uOA3W7i6o3gznQ8hrzkG4KGqgZlDIWDN3dKceGrEc9pb6_ZTkd6Fp7wx6AIfooFwWOUQiKdnLbGBUfesox_x_-KeZ0Z_0J4knNggjB3OP1sbAyOKj6rsWI5iRVnJ2eC4dOxnwNF8CwKe3PQcgW7GTYB63iuRlkajbRmYDXRVi9NorSjqqup2LZlx-6IOtH8kVlsog09V3ErXVqXnNFLiFa6bxRxY26YQ0OlXRdn-MsTzFirOnK3DBvGmRssUvAwV2NYCnCfstujluh_8Wac_n3iFKOR2wlJlKqoSgr2ss3YTzeQ76gH6Bny3tIloMlK2TI6WOlmsaS0ASJ0ER_JzrafCzS6nxyf_S6Syl3hdcqfrWp85iEnm1IeeZprJ1yYOtB9NoxV4ccJOCqNg0ibtb_m8spWWwDtBvx-iY9zABUexPjpxztbiLu4_ISJl2EMTaV4b88QFgqwFX0hPo6WTLmvgFL551DPTbRWqChjBtcnleeX0a5uYMfJvboBwfCGBB18zRyNhDcSAhP0LgtPQzhtvJ-m5iy34JKRyww6J_ySbPT5A_6A248eC1plijHfOSJvTo08Dde0TD8-mm_N8sme30bNQ3QzNXxH6zqWg667g9AtXeQwZJf3Mmc0SVBoZm9iopqvk6XfzL0jCbfBRbG5JL2Gg9Z8NMG3pQCoM4wH8CMtTBBH0H6Lb3J6-bW3Rg' >terraform.tfstate
$ terraform state show -state terraform.tfstate aws_instance.bootstrap | sed -n 's/public_ip .*= //p'
54.146.241.34
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the cluster succeeded, it would have removed the bootstrap node and given an empty string there. And somewhat surprisingly, it will also exit 0:
$ terraform state show -state terraform.tfstate aws_instance.not_in_the_state && echo success
success
So you could use test -n "${bootstrap_ip}"
to switch on "only attempt to pull these if we failed to see the bootstrap-complete
event and tear down the bootstrap resources".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I guess we don't have a terraform
binary anymore... Oh well.
We don't have |
This uses the Terraform state to discover the IP address of the
bootstrap node (ideally, the installer will provide this information in
a form which easier to consume in the future). It then connects to the
gatewayd instance on that machine and pulls the logs for various
services. Hopefully, these logs will be useful when diagnosing
installation failures.