Skip to content

Add support for similar docs query #674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 15, 2025

Conversation

JiaYingZhang
Copy link
Contributor

@JiaYingZhang JiaYingZhang commented Jun 16, 2025

Pull Request

Related issue

Fixes #646

What does this PR do?

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

Summary by CodeRabbit

  • New Features

    • Introduced similarity search capabilities to find documents similar to a specified one within an index.
    • Added configurable options for similarity queries, including pagination, filtering, attribute selection, ranking score visibility, and detailed ranking score information in results.
  • Documentation

    • Included example usage and detailed documentation for the similarity search functionality.
    • Updated filtering documentation links for improved guidance.

This comment was marked as spam.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/lib.rs (1)

264-266: Consider keeping module declarations alphabetically sorted for quicker discoverability

similar is appended after utils, which breaks the existing (roughly) alphabetical ordering of the public mod list (search, settings, snapshots, …). Re-ordering modules helps future contributors locate items faster.

-/// Module related to similar queries and results.
-pub mod similar;
+/// Module related to similar queries and results.
+pub mod similar;          // move this so the list stays alphabetically ordered

Purely cosmetic – adjust only if you value the convention.

src/search.rs (1)

110-118: Visibility bump looks good; double-check companion helpers

Exposing serialize_with_wildcard as pub(crate) is necessary for the new similar module — nice catch.
If similar ever needs the crop helper as well (serialize_attributes_to_crop_with_wildcard), remember to apply the same visibility change or move both helpers to a shared utils sub-module to avoid future leakage.

src/similar.rs (3)

38-39: Clarify documentation for the id field

The comment "Search Doc ID" is ambiguous. Consider clarifying that this is the ID of the reference document used for the similarity search.

-    /// Search Doc ID
+    /// ID of the reference document used for the similarity search
     pub id: String,

212-214: Remove unnecessary clone in build method

The build method clones the entire query object unnecessarily. Consider returning a reference or removing this method if not needed.

-    pub fn build(&mut self) -> SimilarQuery<'a, Http> {
-        self.clone()
-    }
+    pub fn build(&self) -> &SimilarQuery<'a, Http> {
+        self
+    }

Alternatively, if the build method is not essential (since execute can be called directly), consider removing it entirely to simplify the API.


223-225: Fix typo and track embedder configuration

There's a typo in the comment, and the TODO indicates missing functionality.

 // TODO: set UserProvided EembdderConfig
 // Embedder have not been implemented
-// But Now It does't work
+// But Now It doesn't work

Would you like me to create an issue to track the implementation of embedder configuration functionality?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90a153c and 8a2bded.

📒 Files selected for processing (4)
  • src/indexes.rs (2 hunks)
  • src/lib.rs (1 hunks)
  • src/search.rs (1 hunks)
  • src/similar.rs (1 hunks)
🔇 Additional comments (1)
src/indexes.rs (1)

1657-1669: LGTM!

The implementation correctly follows the established pattern for query methods and properly integrates with the similar module.

JiaYingZhang and others added 2 commits June 20, 2025 15:26
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@CommanderStorm
Copy link
Contributor

Is this a duplicate of #600 ?

@JiaYingZhang
Copy link
Contributor Author

Is this a duplicate of #600 ?

It is not a complete repetition, and the implementation methods are inconsistent. I prefer to handle "similar" as an independent query to avoid affecting the search query.

@Mubelotix Mubelotix mentioned this pull request Jul 10, 2025
3 tasks
@Mubelotix Mubelotix changed the base branch from main to dev July 11, 2025 21:47
@Mubelotix
Copy link
Collaborator

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Jul 15, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/similar.rs (5)

24-27: Move documentation comment before attributes.

The documentation comment should be placed before the #[derive] and #[serde] attributes to follow Rust conventions.

+/// A struct containing search results and other information about the search.
 #[derive(Deserialize, Debug, Clone)]
 #[serde(rename_all = "camelCase")]
-/// A struct containing search results and other information about the search.
 pub struct SimilarResults<T> {

38-39: Clarify the documentation for the id field.

The comment "Search Doc ID" seems inconsistent with the context of similarity queries. This should likely be "Similar Query ID" or just "Query ID".

-    /// Search Doc ID
+    /// Query ID
     pub id: String,

217-219: Simplify the build method implementation.

The current implementation unnecessarily requires a mutable reference. Since it just clones, it can be simplified.

-    pub fn build(&mut self) -> SimilarQuery<'a, Http> {
-        self.clone()
-    }
+    pub fn build(&self) -> SimilarQuery<'a, Http> {
+        self.clone()
+    }

375-376: Remove "show" from the test function name for consistency.

The test is for ranking_score_threshold, not show_ranking_score_threshold.

 #[meilisearch_test]
-async fn test_query_show_ranking_score_threshold(
+async fn test_query_ranking_score_threshold(

278-289: Enhance the offset test to verify actual skipping behavior.

The current test only checks the result count but doesn't verify that records were actually skipped. Consider comparing results with and without offset to ensure correctness.

 #[meilisearch_test]
 async fn test_query_offset(client: Client, index: Index) -> Result<(), Error> {
     setup_embedder(&client, &index).await?;
     setup_test_index(&client, &index).await?;
 
+    // Get results without offset
+    let query_no_offset = SimilarQuery::new(&index, "1", "default");
+    let results_no_offset: SimilarResults<Document> = query_no_offset.execute().await?;
+
+    // Get results with offset
     let mut query = SimilarQuery::new(&index, "1", "default");
     query.with_offset(6);
-
     let results: SimilarResults<Document> = query.execute().await?;
+    
+    // Verify that we got the expected number of results
     assert_eq!(results.hits.len(), 3);
+    
+    // Verify that the first result with offset matches the 7th result without offset
+    if results_no_offset.hits.len() > 6 && !results.hits.is_empty() {
+        assert_eq!(results.hits[0].result.id, results_no_offset.hits[6].result.id);
+    }
+    
     Ok(())
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2c440c and 7276bf8.

📒 Files selected for processing (5)
  • src/documents.rs (1 hunks)
  • src/indexes.rs (2 hunks)
  • src/lib.rs (1 hunks)
  • src/search.rs (10 hunks)
  • src/similar.rs (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/documents.rs
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/lib.rs
  • src/indexes.rs
  • src/search.rs
🧰 Additional context used
🧠 Learnings (1)
src/similar.rs (1)
Learnt from: LukasKalbertodt
PR: meilisearch/meilisearch-rust#625
File: src/search.rs:368-370
Timestamp: 2025-06-12T13:28:23.700Z
Learning: In the Meilisearch Rust client, `SearchQuery` serializes its per-query federation settings under the key `federationOptions`; only the top-level multi-search parameter is named `federation`.
🧬 Code Graph Analysis (1)
src/similar.rs (5)
src/indexes.rs (31)
  • search (279-281)
  • new (81-89)
  • new (1775-1781)
  • new (1974-1980)
  • index (2151-2151)
  • index (2168-2169)
  • index (2206-2206)
  • index (2232-2232)
  • index (2260-2260)
  • index (2286-2286)
  • index (2313-2313)
  • with_offset (2012-2015)
  • with_limit (2047-2050)
  • execute (1864-1876)
  • execute (2081-2083)
  • client (186-188)
  • client (232-234)
  • client (320-322)
  • client (376-378)
  • client (428-430)
  • client (473-475)
  • client (522-525)
  • client (556-558)
  • client (634-636)
  • client (700-702)
  • client (970-972)
  • client (1038-1040)
  • client (1089-1091)
  • client (1133-1135)
  • client (1186-1188)
  • client (1245-1247)
src/search.rs (27)
  • serialize_with_wildcard (126-135)
  • new (39-41)
  • new (419-451)
  • new (738-743)
  • new (990-1001)
  • index (718-718)
  • with_offset (458-461)
  • with_limit (463-466)
  • with_filter (535-538)
  • with_filter (1019-1022)
  • with_array_filter (540-546)
  • with_array_filter (1024-1030)
  • with_attributes_to_retrieve (578-584)
  • with_show_ranking_score (639-645)
  • with_show_ranking_score_details (647-653)
  • with_ranking_score_threshold (697-703)
  • with_retrieve_vectors (549-555)
  • build (710-712)
  • build (1056-1058)
  • execute (715-719)
  • execute (791-795)
  • execute (845-851)
  • execute (1060-1062)
  • setup_test_index (1154-1177)
  • query (1369-1369)
  • query (1385-1385)
  • results (2038-2042)
src/documents.rs (15)
  • new (89-94)
  • new (203-211)
  • new (335-340)
  • index (62-64)
  • index (161-161)
  • index (318-318)
  • index (455-455)
  • with_offset (228-231)
  • with_limit (250-253)
  • with_filter (280-283)
  • with_filter (342-348)
  • execute (157-162)
  • execute (315-319)
  • execute (350-352)
  • setup_test_index (391-419)
src/client.rs (1)
  • None (1270-1270)
meilisearch-test-macro/src/lib.rs (1)
  • meilisearch_test (14-201)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: integration-tests

@Mubelotix Mubelotix added the enhancement New feature or request label Jul 15, 2025
Copy link

codecov bot commented Jul 15, 2025

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@Mubelotix Mubelotix merged commit 972da7f into meilisearch:dev Jul 15, 2025
7 checks passed
@Mubelotix
Copy link
Collaborator

Thank you @JiaYingZhang!

@Mubelotix Mubelotix mentioned this pull request Jul 15, 2025
meili-bors bot added a commit that referenced this pull request Jul 15, 2025
682: Merge dev into main r=curquiza a=Mubelotix

Full changelog after this PR: (includes changes merged before this PR)

## What's Changed

### Breaking changes

* Fix deleted_tasks field type in TaskDeletion struct by `@coinmoles` in #666 (`TaskDeletion.deleted_tasks` and `TaskCancelation.canceled_tasks` have been made `Option`s)

### Enhancements

* Support all experimental features by `@ellnix` in #656
* Add `usedDatabaseSize` to stats by `@Alirexaa` in #653
* Add facet_search API functionality by `@hmacr` in #512
* Allow users to customize facet value sort behaviour by `@hmacr` in #514
* Support `embedders` setting and other vector/hybrid search related configuration by `@CommanderStorm` in #554
* Add documents and embbedings database metrics to stats response by `@Alirexaa` in #652
* Support facet search and prefix search settings by `@ellnix` in #634
* Serialize for MultiSearchResponse by `@milesgranger` in #668
* Implement federated multi search by `@LukasKalbertodt` in #625
* Add support for similar docs query by `@JiaYingZhang` in #674

### Maintainance

* Remove _md code samples for Mintlify migration by `@curquiza` in #661
* Add missing `exist` to `*_update` functions' docs by `@funlennysub` in #617
* Add Code Coverage GH action by `@Alirexaa` in #654








## New Contributors
* `@Alirexaa` made their first contribution in #653
* `@funlennysub` made their first contribution in #617
* `@milesgranger` made their first contribution in #668
* `@coinmoles` made their first contribution in #666
* `@JiaYingZhang` made their first contribution in #674

**Full Changelog**: v0.28.0...v0.29.0


Co-authored-by: hmacr <[email protected]>
Co-authored-by: Clémentine U. - curqui <[email protected]>
Co-authored-by: Frank Elsinga <[email protected]>
Co-authored-by: Clémentine <[email protected]>
Co-authored-by: Vladimir Donich <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[v1.13] Stabilize AI-powered search
3 participants