Skip to content

Create API.md #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 3, 2025
Merged

Conversation

owenhalpert
Copy link
Collaborator

Description

Add a readable API document to the root directory describing the contract between the server and client.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@owenhalpert owenhalpert added the documentation Improvements or additions to documentation label Feb 20, 2025
@owenhalpert owenhalpert marked this pull request as ready for review February 20, 2025 21:04
rchitale7
rchitale7 previously approved these changes Feb 20, 2025
Copy link
Member

@rchitale7 rchitale7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

jed326
jed326 previously approved these changes Feb 20, 2025
@owenhalpert owenhalpert force-pushed the owenhalpert-api branch 2 times, most recently from 936c4e0 to c40de7f Compare February 21, 2025 18:57
@jed326 jed326 dismissed stale reviews from rchitale7 and themself February 21, 2025 20:12

new changes

jed326
jed326 previously approved these changes Feb 21, 2025
jed326
jed326 previously approved these changes Feb 24, 2025
rchitale7
rchitale7 previously approved these changes Feb 24, 2025
"doc_count": 1000000,
"data_type": "fp32", // {fp16, fp32, byte, binary}
"engine": "faiss", // {faiss, nmslib, lucene}
"index_parameters": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this mimic the mapping? I guess one question would be do we want to also potentially allow the index build service the freedom to choose the parameters based on high level signals, like "BEST_RECALL", or "compression_level=32x".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this mimic the mapping?

Ideally it should not mimic the mappings. This interface should evolve of its own.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one question would be do we want to also potentially allow the index build service the freedom to choose the parameters based on high level signals, like "BEST_RECALL", or "compression_level=32x".

This is a good suggestion. the problem is from k-NN how we can get this info on what customer is looking for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From knn, we can take this information in via mapping parameters: mode, compression_level. For this, we can expand/generalize these parameters.

For this API, I would add a separate field like:

  • index_parameter_resolution
  • workload_hints
  • user_hints

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API.md Outdated
"data_type": "fp32", // {fp16, fp32, byte, binary}
"engine": "faiss", // {faiss, nmslib, lucene}
"index_parameters": {
"space_type": // {l2, cosinesimil, l1, linf, innerproduct, hamming}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bring this outside of index-parameters? Its more of a data set set defining property as opposed to an index parameter.

Copy link
Collaborator Author

@owenhalpert owenhalpert Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea on the split was to have workload defining variables (doc count, data type, etc) immediately available at the first JSON level and keep the index creation parameters inside to be used once we start building the index. So space type was placed inside because it's used in index creation whereas data type for example is more for initially identifying the size of the file. How does that sound?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - then does engine need to be top level?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the reason on keeping engine on top level is engine can be used by CP to determine what worker to hit. Since in a worker engine will be 1 only. Right now, if you see we only have faiss. Engine is not even required to be really honest.

@owenhalpert
Copy link
Collaborator Author

owenhalpert commented Feb 27, 2025

There are other successful HTTP codes outside of 200 (other 2xx codes). Can we establish expected responses? Or at least establish that client can expect status code in range 200 - 300 to be successful?

Signed-off-by: owenhalpert <[email protected]>
Add expected params for engine, space_type, algo. Change `error` to `error_message`. Split object path into doc ID and vector paths.

Signed-off-by: owenhalpert <[email protected]>
@owenhalpert owenhalpert dismissed stale reviews from rchitale7 and jed326 March 3, 2025 19:15

revision


Request Parameters
{
"repository_type": "s3", // Derived from repository settings
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to block this PR, but I think the "comments" should be moved to a table instead, similar to https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions for example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@owenhalpert owenhalpert merged commit c989027 into opensearch-project:main Mar 3, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants