-
Notifications
You must be signed in to change notification settings - Fork 134
Fix issue when calling cleanup while concurrently executing searches #483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jvector-base/src/main/java/io/github/jbellis/jvector/graph/OnHeapGraphIndex.java
Outdated
Show resolved
Hide resolved
class EmptyNodeIterator implements NodesIterator { | ||
@Override | ||
public int size() { | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't the size just be 0?
// This case may be encountered when concurrently searching while GraphIndexBuilder.cleanup is running. | ||
// In such a case, the neighborhoods may get temporarily broken and we just assume that the node has | ||
// no neighbors. | ||
return new NodesIterator.EmptyNodeIterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's minor, but I think we can make this a final static object in the NodesIterator
to skip object allocation. Given the frequency we hit this code block, it probably doesn't matter much, but in general, we've avoided unnecessary allocation on the query path, so I want to mention it here.
int n = it.nextInt(); | ||
if (completions.completedAt(n) < timestamp) { | ||
return n; | ||
NodesIterator it = getNeighbors(level, node).iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will throw the NPE still, right? getNeighbors(level, node)
returns null
, and we need to check that output before calling the iterator()
method on it.
jvector-tests/src/test/java/io/github/jbellis/jvector/graph/TestConcurrentReadWriteDeletes.java
Show resolved
Hide resolved
@@ -366,47 +366,55 @@ public class ConcurrentGraphIndexView extends FrozenView { | |||
|
|||
@Override | |||
public NodesIterator getNeighborsIterator(int level, int node) { | |||
var it = getNeighbors(level, node).iterator(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you review all other usages of getNeighbors()
? I see some usages that use the result without checking if it is null
. Would it be valid to return an empty Neighbors
object from the getNeighbors
method and therefore change the contract to simplify the code that calls this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really good point. I did this and I think that the code is actually more clear now.
…per random generation
Instead of shelving the PR, as stated in my previous comment. I turned this into a configurable option. This gives us more options moving forward. For now, by default, we will have the previous behavior, where we do an additional pass over every node, trying to improve its connections. We can revisit the default with more empirical evidence. |
…zedTest::getRandom() as the generator. Use 8 threads of possible.
This is intended for #477, yeah? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love for @michaeljmarshall to take another look as well, but this looks like it will resolve the underlying issue to me.
|
||
@Override | ||
public int nextInt() { | ||
return Integer.MIN_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contractually, I think it's more appropriate to throw NoSuchElementException here (https://docs.oracle.com/javase/8/docs/api/java/util/PrimitiveIterator.OfInt.html#nextInt--)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I fixed it there and in more more place, where an IndexOutOfBoundsException was thrown instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
A NullPointerException that was thrown when we were:
This is because there is no good isolation between a FrozenView and the commit of the deletions. If a deleted node was encountered during a concurrent search, its neighborhood was empty and caused the NPE.
This PR fixes this problem by simply returning an empty neighborhood for those nodes.