-
Notifications
You must be signed in to change notification settings - Fork 48
feat: bigframes.bigquery.vector_search
supports use_brute_force
and fraction_lists_to_search
parameters
#1158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nd `fraction_lists_to_search` parameters
@@ -185,17 +185,18 @@ def vector_search( | |||
find nearest neighbors. The column must have a type of ``ARRAY<FLOAT64>``. All elements in | |||
the array must be non-NULL and all values in the column must have the same array dimensions | |||
as the values in the ``column_to_search`` column. Can only be set when query is a DataFrame. | |||
top_k (int, default 10): | |||
top_k (int): | |||
Sepecifies the number of nearest neighbors to return. Default to 10. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it is a breaking change but it is not after digging into the BigQuery SQL document to confirm this. To ensure a smooth user experience, can we keep the default explicit to make the behavior immediately apparent without relying on external documentation? I am open to learning more your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user_brute_force
is also applied for this comment. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I want to keep BigFrames pretty lightweight. If users don't specify a value, I want to make sure we use the server-side default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this approach. It's important that the docstring clearly states the default value of 10, so users can easily understand the parameter's behavior without needing to consult server-side documentation. Because of that, I'm fine with the current implementation.
@@ -204,37 +205,35 @@ def vector_search( | |||
""" | |||
import bigframes.series | |||
|
|||
if not fraction_lists_to_search and use_brute_force is True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep this ValueError
exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sudipto was encountering this error thinking it was a BigFrames bug. I'd rather remove the client-side check and let the server handle this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
e2e failure |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes internal issue 344019989 🦕