Skip to content

Commit de2c17f

Browse files
owenhalpertkolchfa-awsnatebower
authored
Add vectorsearch Remote Index Build docs (#9575)
* Initial Remote Index Build draft Signed-off-by: owenhalpert <[email protected]> * Initial Remote Index Build draft Signed-off-by: owenhalpert <[email protected]> * Doc review Signed-off-by: Fanit Kolchina <[email protected]> * Slight rewording based on answers Signed-off-by: Fanit Kolchina <[email protected]> * Don't refer to feature flag Signed-off-by: Fanit Kolchina <[email protected]> * Update _vector-search/api/knn.md Signed-off-by: kolchfa-aws <[email protected]> * Update _vector-search/remote-index-build.md Signed-off-by: kolchfa-aws <[email protected]> * Update _vector-search/remote-index-build.md Signed-off-by: kolchfa-aws <[email protected]> * Apply suggestions from code review Co-authored-by: kolchfa-aws <[email protected]> Signed-off-by: owenhalpert <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> --------- Signed-off-by: owenhalpert <[email protected]> Signed-off-by: Fanit Kolchina <[email protected]> Signed-off-by: kolchfa-aws <[email protected]> Co-authored-by: Fanit Kolchina <[email protected]> Co-authored-by: kolchfa-aws <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
1 parent 1fd4f71 commit de2c17f

File tree

3 files changed

+125
-0
lines changed

3 files changed

+125
-0
lines changed

_vector-search/api/knn.md

+31
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,37 @@ Field | Description
5757
Some statistics contain *graph* in the name. In these cases, *graph* is synonymous with *native library index*. The term *graph* is reflective of when the plugin only supported the HNSW algorithm, which consists of hierarchical graphs.
5858
{: .note}
5959

60+
#### Remote index build stats
61+
Introduced 3.0
62+
{: .label .label-purple }
63+
64+
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/k-NN/issues/2391).
65+
{: .warning}
66+
67+
If [remote index build]({{site.url}}{{site.baseurl}}/vector-search/remote-index-build/) is enabled, the following statistics are included.
68+
69+
| Field | Description |
70+
|:---|:---|
71+
| `repository_stats.read_success_count` | The number of successful read operations from the repository. |
72+
| `repository_stats.read_failure_count` | The number of failed read operations from the repository. |
73+
| `repository_stats.successful_read_time_in_millis` | The total time, in milliseconds, spent on successful read operations. |
74+
| `repository_stats.write_success_count` | The number of successful write operations to the repository. |
75+
| `repository_stats.write_failure_count` | The number of failed write operations to the repository. |
76+
| `repository_stats.successful_write_time_in_millis` | The total time, in milliseconds, spent on successful write operations. |
77+
| `client_stats.build_request_success_count` | The number of successful build request operations. |
78+
| `client_stats.build_request_failure_count` | The number of failed build request operations. |
79+
| `client_stats.status_request_failure_count` | The number of failed status request operations. |
80+
| `client_stats.status_request_success_count` | The number of successful status request operations. |
81+
| `client_stats.index_build_success_count` | The number of successful index build operations. |
82+
| `client_stats.index_build_failure_count` | The number of failed index build operations. |
83+
| `client_stats.waiting_time_in_ms` | The total time, in milliseconds, that the client has spent awaiting completion of remote builds. |
84+
| `build_stats.remote_index_build_flush_time_in_millis` | The total time, in milliseconds, spent on remote flush operations. |
85+
| `build_stats.remote_index_build_merge_time_in_millis` | The total time, in milliseconds, spent on remote merge operations. |
86+
| `build_stats.remote_index_build_current_merge_operations` | The current number of remote merge operations in progress. |
87+
| `build_stats.remote_index_build_current_flush_operations` | The current number of remote flush operations in progress. |
88+
| `build_stats.remote_index_build_current_merge_size` | The current size of remote merge operations. |
89+
| `build_stats.remote_index_build_current_flush_size` | The current size of remote flush operations. |
90+
6091
#### Example request
6192

6293
The following examples demonstrate how to retrieve statistics related to the k-NN plugin.

_vector-search/remote-index-build.md

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
layout: default
3+
title: Remote index build
4+
nav_order: 72
5+
has_children: false
6+
---
7+
8+
# Building vector indexes remotely using GPUs
9+
Introduced 3.0
10+
{: .label .label-purple }
11+
12+
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/k-NN/issues/2391).
13+
{: .warning}
14+
15+
Starting with version 3.0, OpenSearch supports building vector indexes using a GPU-accelerated remote index build service. Using GPUs dramatically reduces index build times and decreases costs. For benchmarking results, see [this blog post](https://opensearch.org/blog/GPU-Accelerated-Vector-Search-OpenSearch-New-Frontier/).
16+
17+
## Prerequisites
18+
19+
Before configuring the remote index build settings, ensure you fulfill the following prerequisites. For more information about updating dynamic settings, see [Dynamic settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#dynamic-settings).
20+
21+
### Step 1: Enable the remote index build service
22+
23+
Enable the remote index build service for both the cluster and the chosen index by configuring the following settings.
24+
25+
Setting | Static/Dynamic | Default | Description
26+
:--- | :--- | :--- | :---
27+
`knn.feature.remote_index_build.enabled` | Dynamic | `false` | Enables remote vector index building for the cluster.
28+
`index.knn.remote_index_build.enabled` | Dynamic | `false` | Enables remote index building for the index. Currently, the remote index build service supports [Faiss]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#faiss-engine) indexes with the `hnsw` method and the default 32-bit floating-point (`FP32`) vectors.
29+
30+
### Step 2: Create and register the remote vector repository
31+
32+
The remote vector repository acts as an intermediate object store between the OpenSearch cluster and the remote build service. The cluster uploads vectors and document IDs to the repository. The remote build service retrieves the data, builds the index externally, and uploads the completed result back to the repository.
33+
34+
To create and register the repository, follow the steps in [Register repository]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore/#register-repository). Then set the `knn.remote_index_build.vector_repo` dynamic setting to the name of the registered repository.
35+
36+
The remote build service currently only supports Amazon Simple Storage Service (Amazon S3) repositories.
37+
{: .note}
38+
39+
### Step 3: Set up a remote vector index builder
40+
41+
Configure the remote endpoint in the k-NN settings by setting `knn.remote_index_build.client.endpoint` to a running [remote vector index builder](https://github.com/opensearch-project/remote-vector-index-builder) instance. For instructions on setting up the remote service, see [the user guide](https://github.com/opensearch-project/remote-vector-index-builder/blob/main/USER_GUIDE.md).
42+
43+
## Configuring remote index build settings
44+
45+
The remote index build service supports several additional, optional settings. For information about configuring any remaining remote index build settings, see [Remote index build settings]({{site.url}}{{site.baseurl}}/vector-search/settings/#remote-index-build-settings).
46+
47+
## Using the remote index build service
48+
49+
Once the remote index build service is configured, any index on which it is enabled will use the remote vector index builder for builds that meet the configured `index.knn.remote_index_build.size_threshold`.

_vector-search/settings.md

+45
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,48 @@ Setting | Static/Dynamic | Default | Description
4545

4646
An index created in OpenSearch version 2.11 or earlier will still use the previous `ef_construction` and `ef_search` values (`512`).
4747
{: .note}
48+
49+
## Remote index build settings
50+
Introduced 3.0
51+
{: .label .label-purple }
52+
53+
This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/k-NN/issues/2391).
54+
{: .warning}
55+
56+
The following settings control [remote vector index building]({{site.url}}{{site.baseurl}}/vector-search/remote-index-build/).
57+
58+
The `poll_interval`, `timeout`, and `size_threshold` are advanced settings. Their default values are set as a result of extensive benchmarking.
59+
{: .important}
60+
61+
### Cluster settings
62+
63+
The following remote index build settings apply at the cluster level.
64+
65+
Setting | Static/Dynamic | Default | Description
66+
:--- | :--- | :--- | :---
67+
`knn.feature.remote_index_build.enabled` | Dynamic | `false` | Enables remote vector index building for the cluster.
68+
`knn.remote_index_build.vector_repo` | Dynamic | None | The repository to which the remote index builder should write.
69+
`knn.remote_index_build.client.endpoint` | Dynamic | None | The endpoint URL of the remote build service.
70+
`knn.remote_index_build.client.poll_interval` | Dynamic | `5s` | How frequently the client should poll the remote build service for job status.
71+
`knn.remote_index_build.client.timeout` | Dynamic | `60m` | The maximum amount of time to wait for remote build completion before falling back to a CPU-based build.
72+
73+
### Index settings
74+
75+
The following remote index build settings apply at the index level.
76+
77+
Setting | Static/Dynamic | Default | Description
78+
:--- | :--- | :--- | :---
79+
`index.knn.remote_index_build.enabled` | Dynamic | `false` | Enables remote index building for the index. Currently, the remote index build service supports [Faiss]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-methods-engines/#faiss-engine) indexes with the `hnsw` method and the default 32-bit floating-point (`FP32`) vectors.
80+
`index.knn.remote_index_build.size_threshold` | Dynamic | `50mb` | The minimum size required to enable remote vector builds.
81+
82+
### Remote build authentication
83+
84+
The remote build service username and password are secure settings that must be set in the [OpenSearch keystore]({{site.url}}{{site.baseurl}}/security/configuration/opensearch-keystore/) as follows:
85+
86+
```bash
87+
./bin/opensearch-keystore add knn.remote_index_build.client.username
88+
./bin/opensearch-keystore add knn.remote_index_build.client.password
89+
```
90+
{% include copy.html %}
91+
92+
You can reload the secure settings without restarting the node by using the [Nodes Reload Secure]({{site.url}}{{site.baseurl}}/api-reference/nodes-apis/nodes-reload-secure/) API.

0 commit comments

Comments
 (0)