-
Notifications
You must be signed in to change notification settings - Fork 185
OCS Functional Test Suite #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCS Functional Test Suite #114
Conversation
i see the golint/gotest failures. I'll address that. I'm want to get e2e-aws results first before I push another update though |
Thanks @davidvossel ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to have the commit message of the "Introduction of functional test suite" commit mention that you are adding a single test "PVC Creation".
Other than that - this looks nice!
Of course requesting to fix the linting errors.
That was not a thorough review yet, just a first superficial look. I'll let @jarrpa and others do more thorough review too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An overall comment: Anything called utils
getting too large is often a bad thing. Could you chop that up into more descriptive files like create
or config
?
functests/utils.go
Outdated
return t.k8sClient | ||
} | ||
|
||
func (t *TestClient) CreateStorageClusterInitialization() (*ocsv1.StorageClusterInitialization, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't seem to be used anywhere... and really, the StorageCluster reconcile creates the StorageClusterInit already. Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. i noticed that as well when I went to add documentation to fix golint, I'll remove it.
functests/test.go
Outdated
@@ -0,0 +1 @@ | |||
package tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this to something like func_tests
? Or is this a ginkgo thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i named it functests btw instead of func_tests
6ea3489
to
16a824c
Compare
pr updated
I added something like that to the commit message description now.
yep, makes sense. I broke it up into a few files.
so, for the For now, I have the functional tests skip if it detects that the test wasn't invoked by Also, for the e2e test itself, I saw the PVC creation timeout. I'm working through what's going on there now. Since the test passes locally, I suspect there's something with the aws environment in CI that is preventing us from being able to provision PVCs. Hopefully it's something we can work around. I added debug dumping to the functest script to try and gather more information. |
16a824c
to
db9b7c3
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think @jarrpa's request to split up utils.go still remains. Otherwise looks good to me.
Yeah, i broke up the file a bit, but looking back at it I think I can do better. I'll make another update |
e54bba0
to
48600d3
Compare
update pushed. Here's how functests/utils.go has been broken up.
if some more obvious patterns emerge for helper function grouping after more tests are written we might re-approach common.go, but for now I don't have a better suggestion. |
48600d3
to
08fea7e
Compare
@@ -1,5 +1,7 @@ | |||
#!/bin/bash | |||
|
|||
set -e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still want this in the final commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, yes. We do want it, but I probably didn't put it in a commit that makes sense for the change.
The issue I hit was 'make ocs-registry' would "pass" even if an error occurred during the docker build process. Without 'set -e', the exit code of make ocs-registry
doesn't indicate anything useful.
I'll break it into a separate commit at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. i broke this into a separate commit.
I think the functests uncovered a possible issue. It looks like the StorageCluster is reporting that it is available before the osd's have started. This causes the PVC test to fail occasionally. Whether the test fails or not appears to be related to timing. below's a snapshot of what the e2e cluster looked like at the time of a failure.
To make the test pass consistently, I can wait in the setup function for the osds to come online. There might need to be some more discussion about that though. |
08fea7e
to
00212d0
Compare
I think I traced the issue back to how the functional tests wait for the StorageCluster startup condition. If not all expected conditions existed in the list, it looked like it was possible for the "BeforeTestSuite" function to exit earlier than it should have. I added logic to address this. We'll see if it improves the situation. I was able to verify that the StorageCluster's conditions were set correctly to indicate that it was still progressing when the OSDs were not online. So, it seems likely that what I saw was test related. |
The latest CSV appears to be broken. The functional test failure looks real.
and
rook-ceph can't start the osd pods. It seems likely this is some regression in the new rook-ceph image we introduced a few hours ago in this pr, #121 |
How did the test pass on that PR then? |
It passed yesterday before #121 was merged note that when the e2e test runs, it's actually merging with master. So even if I don't rebase this PR, it's this PR+Master that's being reflected in the results. |
I might have misunderstood the question. If you meant how did the e2e test pass on pr #121 it's because there are no functional tests being executed until this PR is merged. The e2e logic as it is without the functional test suite just verifies that the ocs-operator installs and that the operator is online, not that a storage cluster comes online. That's the part this PR adds, along with verifying PVC creation. |
/test ocs-operator-e2e-aws |
I'll up the timeout for now so we can get this PR in. Independently of this PR I'm going to circle back around and try to gain an understanding for why this is occurring though. It appears like the StorageCluster CR may be reporting that it's in a stable state (available, upgradable, not progressing, and not degrading) at some point before the OSDs actually come online, but the CR is clearly reflecting the correct status of the StorageCluster when the timeout occurs. |
00212d0
to
9d55b0f
Compare
9d55b0f
to
1ed12f9
Compare
I bumped the setup timeout to 30 minutes, we'll see if this improves the situation. |
1ed12f9
to
8a32b7d
Compare
I changed the logic to wait an entire hour for the OSDs to come online, and the e2e test is still timing out. I'm trying to reproduce this locally using a similar cluster to what the e2e tests use. One difference I noticed is that they're only using two AZs, but after talking with @jarrpa it appears this theoretically this should be fine. |
I was able to reproduce what's happening in Openshift CI by limiting my OCP deployment to only 2 availability zones in AWS. Only 1 or 2 OSDs ever come online. The OSDs that don't come online complain about not being able to attach a PVC to the OSDs pod. Below is an example.
In order to stabilize the functional tests so we can get at least something merged that verifies basic funcitonality, I'm experimenting with reducing the OSD replica count to 1 just for the functional test environment. |
1c30366
to
9f4df25
Compare
@jarrpa @obnoxxx Functional tests pass again. I had to deploy a work around for issue #131 The osd issue occurs when we have multiple worker nodes within the same AZ. Openshift CI gives us only 2 AZs and 3 worker nodes, which means we'll always have at least 1 extra node in an AZ. For the time being, in order to work around #131 I only label a single node from each AZ and limit the functional test to Replica 2. Using Replica 2 for the e2e test suite should only be considered a stop gap. I'm tracking the root cause of the issue in #131 and I've also opened up a discussion with Openshift CI about increasing the AZ count to 3. openshift/release#5047 Here's my suggestion for moving forward.
|
I'm fine with moving forward like this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Can you squash the non-vendor commits?
Maybe leave the commit that is taking replica down to 2 separate so that we can revert it later? |
Signed-off-by: David Vossel <[email protected]>
Includes initial example test in functests/pvc_create_test.go Signed-off-by: David Vossel <[email protected]>
Signed-off-by: David Vossel <[email protected]>
Signed-off-by: David Vossel <[email protected]>
Signed-off-by: David Vossel <[email protected]>
9f4df25
to
1c30e6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: davidvossel, jarrpa, obnoxxx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
Functional Tests
Our functional test suite uses the ginkgo testing framework.
Prerequisites for running Functional Tests
Running functional test
make functest
Below is some sample output of what to expect.
Developing Functional Tests
All the functional test code lives in the
functests/
directory. For anexample of how a functional test is structured, look at the
functests/pvc_creation_test.go
file.
The tests themselves should invoke simple to understand steps. Put any complex
logic in the functests/utils.go package so test flows are easy to follow.
Running a single test
When developing a test, it's common to just want to run a single functional test
rather than the whole suite. This can be done using ginkgo's "focus" feature.
All you have to do is put a
F
in front of the tests declaration to forceonly that test to run. So, if you have an iteration like
It("some test")
defined, you just need to set that to
FIt("some test")
to force the testsuite to only execute that single test.
Make sure to remove the focus from your test before creating the pull request.
Otherwise the test suite will fail in CI.