Skip to content

Clean up flaky tests #247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions src/test/java/org/opensearch/knn/KNNRestTestCase.java
Original file line number Diff line number Diff line change
Expand Up @@ -877,6 +877,34 @@ public void assertTrainingSucceeds(String modelId, int attempts, int delayInMill
fail("Training did not succeed after " + attempts + " attempts with a delay of " + delayInMillis + " ms.");
}

public void assertTrainingFails(String modelId, int attempts, int delayInMillis) throws InterruptedException,
IOException {
int attemptNum = 0;
Response response;
Map<String, Object> responseMap;
ModelState modelState;
while (attemptNum < attempts) {
Thread.sleep(delayInMillis);
attemptNum++;

response = getModel(modelId, null);

responseMap = createParser(
XContentType.JSON.xContent(),
EntityUtils.toString(response.getEntity())
).map();

modelState = ModelState.getModelState((String) responseMap.get(MODEL_STATE));
if (modelState == ModelState.FAILED) {
return;
}

assertNotEquals(ModelState.CREATED, modelState);
}

fail("Training did not succeed after " + attempts + " attempts with a delay of " + delayInMillis + " ms.");
}

/**
* We need to be able to dump the jacoco coverage before cluster is shut down.
* The new internal testing framework removed some of the gradle tasks we were listening to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -141,56 +141,6 @@ public void testInvalidMetricsStats() {
Collections.singletonList("invalid_metric")));
}

//TODO: Fix broken test case
// This test case intended to check whether the "graph_query_error" stat gets incremented when a query fails.
// It sets the circuit breaker limit to 1 kb and then indexes documents into the index and force merges so that
// the sole segment's graph will not fit in the cache. Then, it runs a query and expects an exception. Then it
// checks that the query errors get incremented. This test is flaky:
// https://github.com/opensearch-project/k-NN/issues/43. During query, when a segment to be
// searched is not present in the cache, it will first be loaded. Then it will be searched.
//
// The problem is that the cache built from CacheBuilder will not throw an exception if the entry exceeds the
// size of the cache - tested this via log statements. However, the entry gets marked as expired immediately.
// So, after loading the entry, sometimes the expired entry will get evicted before the search logic. This causes
// the search to fail. However, it appears sometimes that the entry doesnt get immediately evicted, causing the
// search to succeed.
// public void testGraphQueryErrorsGetIncremented() throws Exception {
// // Get initial query errors because it may not always be 0
// String graphQueryErrors = StatNames.GRAPH_QUERY_ERRORS.getName();
// Response response = getKnnStats(Collections.emptyList(), Collections.singletonList(graphQueryErrors));
// String responseBody = EntityUtils.toString(response.getEntity());
// Map<String, Object> nodeStats = parseNodeStatsResponse(responseBody).get(0);
// int beforeErrors = (int) nodeStats.get(graphQueryErrors);
//
// // Set the circuit breaker very low so that loading an index will definitely fail
// updateClusterSettings("knn.memory.circuit_breaker.limit", "1kb");
//
// Settings settings = Settings.builder()
// .put("number_of_shards", 1)
// .put("index.knn", true)
// .build();
// createKnnIndex(INDEX_NAME, settings, createKnnIndexMapping(FIELD_NAME, 2));
//
// // Add enough docs to trip the circuit breaker
// Float[] vector = {1.3f, 2.2f};
// int docsInIndex = 25;
// for (int i = 0; i < docsInIndex; i++) {
// addKnnDoc(INDEX_NAME, Integer.toString(i), FIELD_NAME, vector);
// }
// forceMergeKnnIndex(INDEX_NAME);
//
// // Execute a query that should fail
// float[] qvector = {1.9f, 2.4f};
// expectThrows(ResponseException.class, () ->
// searchKNNIndex(INDEX_NAME, new KNNQueryBuilder(FIELD_NAME, qvector, 10), 10));
//
// // Check that the graphQuery errors gets incremented
// response = getKnnStats(Collections.emptyList(), Collections.singletonList(graphQueryErrors));
// responseBody = EntityUtils.toString(response.getEntity());
// nodeStats = parseNodeStatsResponse(responseBody).get(0);
// assertTrue((int) nodeStats.get(graphQueryErrors) > beforeErrors);
// }

/**
* Test checks that handler correctly returns stats for a single node
*
Expand Down
Loading