Skip to content

feat: bigframes.bigquery.vector_search supports use_brute_force and fraction_lists_to_search parameters #1158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 21, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Nov 19, 2024

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 344019989 🦕

@tswast tswast requested review from a team as code owners November 19, 2024 17:15
@tswast tswast requested a review from chelsea-lin November 19, 2024 17:15
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 19, 2024
@@ -185,17 +185,18 @@ def vector_search(
find nearest neighbors. The column must have a type of ``ARRAY<FLOAT64>``. All elements in
the array must be non-NULL and all values in the column must have the same array dimensions
as the values in the ``column_to_search`` column. Can only be set when query is a DataFrame.
top_k (int, default 10):
top_k (int):
Sepecifies the number of nearest neighbors to return. Default to 10.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it is a breaking change but it is not after digging into the BigQuery SQL document to confirm this. To ensure a smooth user experience, can we keep the default explicit to make the behavior immediately apparent without relying on external documentation? I am open to learning more your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user_brute_force is also applied for this comment. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I want to keep BigFrames pretty lightweight. If users don't specify a value, I want to make sure we use the server-side default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this approach. It's important that the docstring clearly states the default value of 10, so users can easily understand the parameter's behavior without needing to consult server-side documentation. Because of that, I'm fine with the current implementation.

@@ -204,37 +205,35 @@ def vector_search(
"""
import bigframes.series

if not fraction_lists_to_search and use_brute_force is True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this ValueError exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sudipto was encountering this error thinking it was a BigFrames bug. I'd rather remove the client-side check and let the server handle this case.

Copy link
Contributor

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@tswast
Copy link
Collaborator Author

tswast commented Nov 21, 2024

e2e failure test_read_gbq_table_clustered_with_filter seems unrelated. Merging.

@tswast tswast merged commit 131edc3 into main Nov 21, 2024
22 of 23 checks passed
@tswast tswast deleted the fraction-to-search branch November 21, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants