Skip to content

templates/openshift: grab bootstrap log on failure #2581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

templates/openshift: grab bootstrap log on failure #2581

wants to merge 1 commit into from

Conversation

crawford
Copy link
Contributor

This uses the Terraform state to discover the IP address of the
bootstrap node (ideally, the installer will provide this information in
a form which easier to consume in the future). It then connects to the
gatewayd instance on that machine and pulls the logs for various
services. Hopefully, these logs will be useful when diagnosing
installation failures.

This uses the Terraform state to discover the IP address of the
bootstrap node (ideally, the installer will provide this information in
a form which easier to consume in the future). It then connects to the
gatewayd instance on that machine and pulls the logs for various
services. Hopefully, these logs will be useful when diagnosing
installation failures.
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 14, 2019
@crawford
Copy link
Contributor Author

This requires that jq is available.
/hold

cc @abhinavdahiya @wking

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 14, 2019
@@ -373,8 +373,27 @@ objects:
exit 1
fi

/bin/openshift-install --dir=/tmp/artifacts/installer create cluster &
wait "$!"
if ! /bin/openshift-install --dir=/tmp/artifacts/installer create cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping this as one of the steps in teardown where we gather artifacts... ?
https://github.com/openshift/release/pull/2581/files#diff-6a0349f9e2bc8f6920d3661f2b81a8e4L388

Copy link
Member

@wking wking Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping this as one of the steps in teardown where we gather artifacts...

And since we're hoping to have the installer gather these automatically at some point, we probably don't need to bother with "did we succeed?" checks there. We can just loop through the desired services and attempt to pull them. If the bootstrap node is already gone, those attempts will produce empty files, but that's not a big problem.


bootstrap_ip=$(jq \
'.modules[].resources."aws_instance.bootstrap".primary.attributes."public_ip" | select(.)' \
/tmp/artifacts/installer/terraform.tfstate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to extract this with terraform with something like:

bootstrap_ip=$(terraform state show -state=path/to/terraform.tfstate bootstrap.aws_instance.bootstrap | grep public_ip)`

I'm guessing on the strings, I'll see if I can get a state file to figure them out...

Copy link
Member

@wking wking Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this run:

$ curl -s 'https://00e9e64bacd20f79e7cb46eb3ffb1bc4273aa1103ece928403-apidata.googleusercontent.com/download/storage/v1/b/origin-ci-test/o/logs%2Frelease-openshift-origin-installer-e2e-aws-4.0%2F3127%2Fartifacts%2Frelease-e2e-aws%2Finstaller%2Fterraform.tfstate?qk=AD5uMEsd-13YETPiOGZkf4uln_LbljtGL7JG3VZKk8XT0wU97zS18U8pRPrknNPI_ofY0W6eiunG9V6-y8Gag5SzVr9gNpmY8LRFcGMQnBVTnnzS43ZLObSZ6iNPFiDSwJTkCVqCiwAn4m5_5nmQHuQzeXXVYakUmsCm84A3iuZkiKTHmb7RyZ5oWKot26EPQPx-2vOq-sBktnZGKxzJG4R9xzrOXvcd8uOA3W7i6o3gznQ8hrzkG4KGqgZlDIWDN3dKceGrEc9pb6_ZTkd6Fp7wx6AIfooFwWOUQiKdnLbGBUfesox_x_-KeZ0Z_0J4knNggjB3OP1sbAyOKj6rsWI5iRVnJ2eC4dOxnwNF8CwKe3PQcgW7GTYB63iuRlkajbRmYDXRVi9NorSjqqup2LZlx-6IOtH8kVlsog09V3ErXVqXnNFLiFa6bxRxY26YQ0OlXRdn-MsTzFirOnK3DBvGmRssUvAwV2NYCnCfstujluh_8Wac_n3iFKOR2wlJlKqoSgr2ss3YTzeQ76gH6Bny3tIloMlK2TI6WOlmsaS0ASJ0ER_JzrafCzS6nxyf_S6Syl3hdcqfrWp85iEnm1IeeZprJ1yYOtB9NoxV4ccJOCqNg0ibtb_m8spWWwDtBvx-iY9zABUexPjpxztbiLu4_ISJl2EMTaV4b88QFgqwFX0hPo6WTLmvgFL551DPTbRWqChjBtcnleeX0a5uYMfJvboBwfCGBB18zRyNhDcSAhP0LgtPQzhtvJ-m5iy34JKRyww6J_ySbPT5A_6A248eC1plijHfOSJvTo08Dde0TD8-mm_N8sme30bNQ3QzNXxH6zqWg667g9AtXeQwZJf3Mmc0SVBoZm9iopqvk6XfzL0jCbfBRbG5JL2Gg9Z8NMG3pQCoM4wH8CMtTBBH0H6Lb3J6-bW3Rg' >terraform.tfstate
$ terraform state show -state terraform.tfstate aws_instance.bootstrap | sed -n 's/public_ip .*= //p'
54.146.241.34

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the cluster succeeded, it would have removed the bootstrap node and given an empty string there. And somewhat surprisingly, it will also exit 0:

$ terraform state show -state terraform.tfstate aws_instance.not_in_the_state && echo success
success

So you could use test -n "${bootstrap_ip}" to switch on "only attempt to pull these if we failed to see the bootstrap-complete event and tear down the bootstrap resources".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess we don't have a terraform binary anymore... Oh well.

@crawford
Copy link
Contributor Author

We don't have jq or terraform in the CI containers. I'll need to rethink the approach a little.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants