Skip to content

[RFC] gRPC-based API for Search  #15190

Closed
@amberzsy

Description

@amberzsy

Is your feature request related to a problem? Please describe

Inspiration

Per effort of #6844 and benchmarking result (#10684 (comment)) (~20%), we can consider step further on adding support on gRPC-based API with protobuf as serializing/de-serializing. To validate our assumption on potential performance gain over protobuf which should be more efficient and compact compare to JSON, we performed PoC for client <> server protobuf on Search API with specific query types and we are able to see promising result from opensearch-project/opensearch-clients#69.

Proposal

With ongoing effort for node-to-node communication, which focuses more on Transport Layer with implementing StreamInput, StreamOutput with protobuf serializer/de-serializers. We can expand the effort and have client <> server protobuf support in parallel to achieve more significant performance gain.

The proto definition for search API and partial overlap with transport layer should follow opensearch-api-specification which is widely adopted by clients.

For server side change there are two options here:

  1. Introduce new content-type and expose option to end-user send and receive protobuf binary payloads.
    Pros: faster development cycle to begin with as potentially the extension on existing searchRequest/Response, builder
    XContent.
    Cons: potentially introduce significant code refactoring which introduces complexity alongside the development.

  2. Implement new streaming-style search API(gRPC) using protobuf and expose new grpc endpoint for search API.
    Pros:
    a) gRPC natively supports client-side, server-side, and bidirectional streaming, allowing for real-time
    communication. This is more efficient than HTTP/1.1 used by REST
    b) generates client and server code in multiple programming languages based on the proto files. This reduces
    boilerplate code and ensures consistency across different languages and platforms.
    c) less code refactoring
    Cons:
    a) the development cycle might not as fast as approach 1.
    b) Though bringing up new grpc service and hook with the internal transport layer might not be too complicated,
    there will be unknowns on the overall integration with existing ecosystem, e.g related plugins (security, knn,
    sql, some other monitoring etc).

For client (Java, Go, Python etc), would have support to optionally use new protobuf-based server API with minimal changes (i.e. no need to rewrite an application already using the client)

Next Steps

  1. Generate proto from opensearch-api-specification (refer: https://github.com/nytimes/openapi2proto)
  2. bootstrap / create gRPC SearchService (SearchGRPCService) and hook with internal layer (clusterservice, actionlisterner etc)
  3. grpcHandlers for searchAction: add grpc/action/search and register in ActionModule
  4. There are ~ 40+ queryBuilder/types, need to target on knn related as . (? CorrelationQuery)
  5. ?? integrate with transport layer protobuf implementation (node-to-node)

Timeline

2.17 release: (09/03/2024 ~ 09/17/2024)
[Experimental Feature]

  1. protobuf definitions
  2. simple matchAll query for E2E poc.
  3. feature will be marked as experiment.

Related

Transport layer Protobuf support: #6844

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCIssues requesting major changesRoadmap:SearchProject-wide roadmap labelSearchSearch query, autocomplete ...etcSearch:PerformanceenhancementEnhancement or improvement to existing feature or requestv2.19.0Issues and PRs related to version 2.19.0

    Type

    No type

    Projects

    Status

    ✅ Done

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions