Skip to content

Support for OpenSearch alias type #3246

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 23, 2025

Conversation

qianheng-aws
Copy link
Collaborator

@qianheng-aws qianheng-aws commented Jan 15, 2025

Description

Add Support for OpenSearch alias type, so PPL can query fields of alias type correctly.

Implementation:

  • Add a new ExprType named OpenSearchAliasType
  • Add special parsing logic for alias type in OpenSearchDataType::parseMapping
  • Use original path and type when constructing ReferenceExpression. For now, only alias type will have different original path and type.
  • When constructing _source in OpenSearchRequestBuilder::pushDownProjects, use path instead of attr since they are different for alias type. The former one is more accurate since OpenSearch don't accept fields of alias type in _source.

e.g. After this PR:

# Create Index with alias type
PUT /try
{
    "mappings":{
        "properties": {
            "time1": {
                "type": "date"
            },
            "time2": {
                "type": "alias",
                "path": "time1"
            }
        }
    }
}

# Add doc
POST try/_doc
{
  "time1": 1736911491131
}
# Query with field of alias type
POST /_plugins/_ppl/
{
   "query": "source=try |where `time2` > '2025-01-01' |  fields `time2`"
}

# Get result
{
  "schema": [
    {
      "name": "time2",
      "type": "timestamp"
    }
  ],
  "datarows": [
    [
      "2025-01-15 03:24:51.131"
    ]
  ],
  "total": 1,
  "size": 1
}

Related Issues

Resolves #3069

Options for this feature

Option1(The PR for now):

Add compounded datatype named OpenSearchAliasType in TypeEnvironment, and resolve ident of alias field inner ReferenceExpression. OpenSearchAliasType will record both path and original basic datatype inner it.

Option2:

Add a new hashmap in IndexMapping to record the alias field mapping to its referred path, it may has structure like:
Map<String, String> aliasMapping
and also add a new hashmap in SymbleTable to record the alias field, it may has structure like:
Map<Namespace, NavigableMap<String, String>> aliasByNamespace
When resolving ident, we need to lookup aliasByNamespace first to identify if this field is an alias, then lookup the original tableByNamespace to reslve type. It makes more sense to not maintaining a wired data type of alias, but will have drawback of making this analyzing process more complex than option1

Option3:

Parsing the data type of alias fields to be the data type of its referred field directly.
But in OpenSearchIndexScan, it should be aware of the mapping of opensearch index, and then populating ExprValue with alias fields filled with the value of their referred fields. It aims to resolve the alias fields from the source data.
That’s said, since we’ve resolved the alias fields from the soruce, they are no longer alias fields in the following processing. Assuming we have update on its referred fields after scanning, the alias field won’t be udpated as its path has no relation with the value on updated path. The functionality may not align with the original purpose of opensearch alias type in some cases.


Option for moving to Calcite in future

In Calcite, there is a special data type named SingleColumnAliasRelDataType, which may satisfy our requirements. This RelDataType maintains single column but can resolve 2 kinds of idents with name in original or alias. But I’m afraid it cannot handle the corner cases that more than one alias field refer to the same path, which is rare but actually allowed in OpenSearch. Unless we can extend that RelDataType to remove the limitation and handle multiple fields in alias.

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Heng Qian <[email protected]>
(cherry picked from commit 2f475af)
Signed-off-by: Heng Qian <[email protected]>
YANG-DB
YANG-DB previously approved these changes Jan 15, 2025
Copy link
Member

@YANG-DB YANG-DB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice - thanks !

@penghuo penghuo added backport 2.x enhancement New feature or request labels Jan 16, 2025
# Conflicts:
#	integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java
#	integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java
#	integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx!

@penghuo penghuo merged commit 1284c12 into opensearch-project:main Jan 23, 2025
15 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.x
# Create a new branch
git switch --create backport/backport-3246-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 1284c121e0abeef514b35941e5722d906515ff2b
# Push it to GitHub
git push --set-upstream origin backport/backport-3246-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-3246-to-2.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] PPL does not support querying Alias fields
5 participants