Skip to content

[BUG] Unexpected Ranking Behavior in Hybrid Query with Min-Max Normalization and Arithmetic Mean Combination #910

Closed
@rohantilva

Description

@rohantilva

Opensearch Version: 2.15
Environment: AWS OpenSearch

Issue Description

I am executing hybrid queries with three sub-queries on a large dataset containing tens to hundreds of thousands of documents. The queries are weighted as follows: [0.9998, 0.0001, 0.0001], with the first query having the highest weight. However, I am seeing unexpected results where a document with a high score from the first query is missing from the top results in the final ranking, while documents with lower scores from the same query are included.

Example:

  • Documents: A, B, C, D
  • Query 1 Scores (when run independently):
    • Document A: 1200
    • Document B: 1000
    • Document C: 300
    • Document D: 100

However, in the hybrid query, Document B does not appear in the top results, but Document C does, despite the heavily skewed weighting toward the first query (0.9998).

Pipeline Configuration:

{
  "phase_results_processors": [
    {
      "normalization-processor": {
        "combination": {
          "parameters": {
            "weights": [
              0.9998,
              0.0001,
              0.0001
            ]
          },
          "technique": "arithmetic_mean"
        },
        "normalization": {
          "technique": "min_max"
        }
      }
    }
  ]
}

Observations:

Screenshot 2024-09-12 at 5 31 17 PM

Essentially, even if Document C returns the highest possible scores from queries 2 and 3, it cannot score higher than Document B. Given this, it seems impossible for Document B to not appear in the final results, and Document C should not rank higher.

Question:

How is it possible for Document B to be excluded from the top results while Document C is included, given the heavily skewed weights and expected normalization?

Related component

Search:Relevance

Expected behavior

I would expect Document B to appear in the hybrid query search results no matter what, given the weight we've assigned to the first query.

Metadata

Metadata

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions