Skip to content

Fix issue when calling cleanup while concurrently executing searches #483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jun 28, 2025

Conversation

marianotepper
Copy link
Collaborator

A NullPointerException that was thrown when we were:

  1. deleting nodes
  2. running cleanup (that commits the deletions) while concurrently executing searches

This is because there is no good isolation between a FrozenView and the commit of the deletions. If a deleted node was encountered during a concurrent search, its neighborhood was empty and caused the NPE.

This PR fixes this problem by simply returning an empty neighborhood for those nodes.

class EmptyNodeIterator implements NodesIterator {
@Override
public int size() {
throw new UnsupportedOperationException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the size just be 0?

// This case may be encountered when concurrently searching while GraphIndexBuilder.cleanup is running.
// In such a case, the neighborhoods may get temporarily broken and we just assume that the node has
// no neighbors.
return new NodesIterator.EmptyNodeIterator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but I think we can make this a final static object in the NodesIterator to skip object allocation. Given the frequency we hit this code block, it probably doesn't matter much, but in general, we've avoided unnecessary allocation on the query path, so I want to mention it here.

int n = it.nextInt();
if (completions.completedAt(n) < timestamp) {
return n;
NodesIterator it = getNeighbors(level, node).iterator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will throw the NPE still, right? getNeighbors(level, node) returns null, and we need to check that output before calling the iterator() method on it.

@@ -366,47 +366,55 @@ public class ConcurrentGraphIndexView extends FrozenView {

@Override
public NodesIterator getNeighborsIterator(int level, int node) {
var it = getNeighbors(level, node).iterator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you review all other usages of getNeighbors()? I see some usages that use the result without checking if it is null. Would it be valid to return an empty Neighbors object from the getNeighbors method and therefore change the contract to simplify the code that calls this method?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. I did this and I think that the code is actually more clear now.

@marianotepper
Copy link
Collaborator Author

Instead of shelving the PR, as stated in my previous comment. I turned this into a configurable option. This gives us more options moving forward.

For now, by default, we will have the previous behavior, where we do an additional pass over every node, trying to improve its connections. We can revisit the default with more empirical evidence.

@marianotepper marianotepper marked this pull request as ready for review June 27, 2025 20:31
@jkni
Copy link
Contributor

jkni commented Jun 27, 2025

Instead of shelving the PR, as stated in my previous comment. I turned this into a configurable option. This gives us more options moving forward.

For now, by default, we will have the previous behavior, where we do an additional pass over every node, trying to improve its connections. We can revisit the default with more empirical evidence.

This is intended for #477, yeah?

Copy link
Contributor

@jkni jkni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love for @michaeljmarshall to take another look as well, but this looks like it will resolve the underlying issue to me.


@Override
public int nextInt() {
return Integer.MIN_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contractually, I think it's more appropriate to throw NoSuchElementException here (https://docs.oracle.com/javase/8/docs/api/java/util/PrimitiveIterator.OfInt.html#nextInt--)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I fixed it there and in more more place, where an IndexOutOfBoundsException was thrown instead.

Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@marianotepper marianotepper merged commit 98730a1 into main Jun 28, 2025
8 checks passed
@marianotepper marianotepper deleted the concurrent-read-writes-deletes branch June 28, 2025 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants