Skip to content

Fix Bulk API schemas #843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 4, 2025
Merged

Fix Bulk API schemas #843

merged 5 commits into from
May 4, 2025

Conversation

karenyrx
Copy link
Contributor

@karenyrx karenyrx commented Mar 16, 2025

Description

While working on GRPC Bulk endpoint, discovered a few discrepancies in the Bulk protos with the source code

  1. WaitForActiveShardOptions does not support index-settings. It is all, null, or an integer greater than 0 according to ActiveShardCount#parseString.
  2. routing field is a String, not array of String
  3. CreateOperation does not support version_type, and versioning-related fields. makes sense as CreateOperation is for creating a document for the very first time.
  4. UpdateOperation does not support version_types and will always use VersionType.INTERNAL
  5. dynamic_templates is not supported for IndexOperation and CreateOperation
  6. Spec is missing ShardInfo type to be used in the BulkResponse. The existing ShardStatistics struct is only used for Search responses.
  7. Missing metaFields in InlineGetDictUserDefined field.
  8. ErrorCause should have an additional field called header, which is a Map<String, List<String> | String>
  9. ScriptLanguage should be a oneOf not anyOf.

Issues Resolved

General spec fixes, contributing to accuracy of the Bulk Protobufs:
opensearch-project/OpenSearch#16784
opensearch-project/opensearch-protobufs#28

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dblock
Copy link
Member

dblock commented Mar 20, 2025

Looks like a good start. Sign DCO (git commit -s --amend), get it to green. Thanks!

Copy link
Contributor

github-actions bot commented Mar 21, 2025

Changes Analysis

Commit SHA: fe07051
Comparing To SHA: f82f42d

API Changes

Summary

├─┬Paths
│ ├─┬/{index}/_bulk
│ │ ├─┬PUT
│ │ │ └─┬Parameters
│ │ │   └─┬Schema
│ │ │     └──[🔀] $ref (36493:13)❌ 
│ │ └─┬POST
│ │   └─┬Parameters
│ │     └─┬Schema
│ │       └──[🔀] $ref (36493:13)❌ 
│ └─┬/_bulk
│   ├─┬PUT
│   │ └─┬Parameters
│   │   └─┬Schema
│   │     └──[🔀] $ref (36493:13)❌ 
│   └─┬POST
│     └─┬Parameters
│       └─┬Schema
│         └──[🔀] $ref (36493:13)❌ 
└─┬Components
  ├──[➕] schemas (36895:7)
  ├─┬_core.bulk___WriteOperation
  │ └─┬ALLOF
  │   └──[➖] properties (46564:13)❌ 
  ├─┬_core.bulk___DeleteOperation
  │ └──[🔀] $ref (46469:7)❌ 
  ├─┬_core.bulk___UpdateOperation
  │ └─┬ALLOF
  │   ├──[➕] properties (46596:13)
  │   ├──[➕] properties (46599:13)
  │   ├─┬retry_on_conflict
  │   │ └──[🔀] $ref (36875:13)❌ 
  │   └─┬require_alias
  │     ├──[🔀] type (46597:21)❌ 
  │     ├──[➕] format (46598:23)❌ 
  │     └──[➖] description (46554:28)
  ├─┬_core.bulk___ResponseItem
  │ └─┬_shards
  │   └──[🔀] $ref (36895:13)❌ 
  ├─┬_common___InlineGetDictUserDefined
  │ └──[➕] additionalProperties (35620:9)❌ 
  ├─┬_common___ScriptLanguage
  │ ├──[➖] anyOf (34619:7)❌ 
  │ ├──[➖] anyOf (36542:11)❌ 
  │ ├──[➕] oneOf (34619:7)
  │ └──[➕] oneOf (36545:11)
  ├─┬_core.bulk___IndexOperation
  │ └──[🔀] $ref (46483:7)❌ 
  ├─┬_core.bulk___OperationBase
  │ ├──[➖] properties (46455:9)❌ 
  │ ├──[➖] properties (46458:9)❌ 
  │ ├──[➖] properties (46460:9)❌ 
  │ └──[➖] properties (46462:9)❌ 
  └─┬_common___WaitForActiveShardOptions
    ├──[➕] oneOf (37383:11)
    └─┬ONEOF
      ├──[➕] not (37380:13)❌ 
      ├──[🔀] description (37382:24)
      └──[➖] const (37357:18)❌ 

Document Element Total Changes Breaking Changes
paths 4 4
components 24 20
  • BREAKING Changes: 24 out of 28
  • Modifications: 10
  • Removals: 9
  • Additions: 9
  • Breaking Removals: 8
  • Breaking Modifications: 9
  • Breaking Additions: 3

Report

The full API changes report is available at: https://github.com/opensearch-project/opensearch-api-specification/actions/runs/14813287835/artifacts/3055933824

API Coverage

Before After Δ
Covered (%) 663 (64.94 %) 663 (64.94 %) 0 (0 %)
Uncovered (%) 358 (35.06 %) 358 (35.06 %) 0 (0 %)
Unknown 49 49 0

Copy link
Contributor

Spec Test Coverage Analysis

Total Tested
599 597 (99.67 %)

@karenyrx karenyrx marked this pull request as ready for review March 27, 2025 19:38
@karenyrx karenyrx changed the title Bulk spec fixes Fixed Bulk endpoint API schemas Mar 27, 2025
@karenyrx karenyrx changed the title Fixed Bulk endpoint API schemas Fixed Bulk API schemas Mar 27, 2025
@karenyrx karenyrx changed the title Fixed Bulk API schemas Fix Bulk API schemas Mar 28, 2025
@karenyrx karenyrx marked this pull request as draft April 1, 2025 19:36
@nhtruong nhtruong marked this pull request as ready for review April 14, 2025 16:43
Signed-off-by: Karen Xu <[email protected]>
@karenyrx
Copy link
Contributor Author

Rebased to address merge conflicts. @dblock @nhtruong could you help take another look? Thanks!

@dblock
Copy link
Member

dblock commented Apr 29, 2025

Looks fine, but I'd expect to see some tests that fail without these changes?

@karenyrx karenyrx requested a review from Xtansia May 2, 2025 22:00
@karenyrx
Copy link
Contributor Author

karenyrx commented May 3, 2025

Looks fine, but I'd expect to see some tests that fail without these changes?

For all the fields that were removed -- such as: index-settings from WaitForActiveShardOptions, version_type from CreateOperation, version_type for UpdateOperation, dynamic_templates for IndexOperation and CreateOperation, string array type in routing, and skipped in ShardStatistics -- I don't think there existed tests that used these properties in the first place, so removing them did not cause any tests to fail. For these, I am unsure if there is a way to retroactively add tests to check the response does not include certain fields, but let me know if that is untrue and I can try adding some tests.

For the fields that were newly added, such as metadata_fields for InlineGetDictUserDefined, and header in ErrorCause, there definitely could exist new tests for them, but since these are on the response-side, I have not had the bandwidth to find a way to trigger these responses from OpenSearch yet. If it is not a blocker for this PR, perhaps we can merge these spec fixes in first and I can create an issue to backfill those tests later.

For all the fields that were modified, such as ScriptLanguage, although tests existed, since I am just tightening the schema definition not changing it, these tests continued to pass as well.

@dblock Let me know what you think! Thanks

@dblock
Copy link
Member

dblock commented May 4, 2025

For the fields that were newly added, such as metadata_fields for InlineGetDictUserDefined, and header in ErrorCause, there definitely could exist new tests for them, but since these are on the response-side, I have not had the bandwidth to find a way to trigger these responses from OpenSearch yet. If it is not a blocker for this PR, perhaps we can merge these spec fixes in first and I can create an issue to backfill those tests later.

Not a blocker, works for me.

@dblock dblock merged commit eb0f803 into opensearch-project:main May 4, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants