Skip to content

[BUG] Remote Cluster State Diff Download Failures while performing IndicesAliases Action #18045

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Pranshu-S opened this issue Apr 23, 2025 · 1 comment · Fixed by #18256
Closed
Labels
bug Something isn't working Cluster Manager

Comments

@Pranshu-S
Copy link
Contributor

Describe the bug

Issue Overview

When Remote State and Publication is enabled in a OpenSearch cluster, executing an PUT _aliases request (IndicesAliasAction) that removes an index and assigns its name as an alias to another index in the same request causes a cluster state update failure specifically a diff download failure. This results in the cluster manager stepping down and new node stepping up as a cluster manager. All nodes then follow by doing a full cluster state download as a consequence of node-joins

java.lang.IllegalStateException: index, alias, and data stream names need to be unique, but the following duplicates were found [index-2 (alias of [index-1/cf4980a322bac991a0ee6]) conflicts with index]

The request goes through however it internally leads to a cluster manager election (due to previous cluster manager stepping down as a result of publication failure). This has the following consequence:

  1. High Latency of the operation
  2. Full Cluster State download due to node-joins which can have a dire consequence in case of a large cluster state and multiple nodes.

Related component

Cluster Manager

To Reproduce

  1. Bring up an OpenSearch domain with Remote Publication Enabled
  2. Create two indices (sampleIndex1 and sampleIndex2)
    curl -X PUT "localhost:9200/index-1" -H 'Content-Type: application/json'     
    curl -X PUT "localhost:9200/index-2" -H 'Content-Type: application/json'  
    
  3. Note the discovery stats for the nodes -
    curl -X GET "localhost:9200/_nodes/stats"
    
    .
    .
    .
          "remote_full_download" : {
            "success_count" : 2,
            "failed_count" : 0,
            "total_time_in_millis" : 210,
            "incoming_publication_failed_count" : 0,
            "checksum_validation_failed_count" : 0
          },
          "remote_diff_download" : {
            "success_count" : 46,
            "failed_count" : 1,
            "total_time_in_millis" : 1129,
            "incoming_publication_failed_count" : 1,
            "checksum_validation_failed_count" : 0
          }
    
  4. Run the IndicesAction below
    curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
    {
        "actions": [
            {
                "remove_index": {
                    "index": "index-1"
                }
            },
            {
                "add": {
                    "index": "index-2",
                    "alias": "index-1"
                }
            }
        ]
    }'
    
  5. Run the Stats call again -
    curl -X GET "localhost:9200/_nodes/stats"
    
    .
    .
    .
          "remote_full_download" : {
            "success_count" : 3,
            "failed_count" : 0,
            "total_time_in_millis" : 272,
            "incoming_publication_failed_count" : 0,
            "checksum_validation_failed_count" : 0
          },
          "remote_diff_download" : {
            "success_count" : 47,
            "failed_count" : 2,
            "total_time_in_millis" : 1129,
            "incoming_publication_failed_count" : 2,
            "checksum_validation_failed_count" : 0
          }
    

Expected behavior

The _alias request should go through without any download failures (Publication should succeed)

Additional Details

Plugins
NA, Core issue

Screenshots
None

Host/Environment (please complete the following information):

  • Version - OS_2.17

Additional context
None

@Pranshu-S
Copy link
Contributor Author

The issue appears to stem from the read path of the publication on the nodes where we create the new cluster state update from the incoming manifest. Reference -

public ClusterState getClusterStateUsingDiff(ClusterMetadataManifest manifest, ClusterState previousState, String localNodeId) {
try {
assert manifest.getDiffManifest() != null : "Diff manifest null which is required for downloading cluster state";
final long startTimeNanos = relativeTimeNanosSupplier.getAsLong();
ClusterStateDiffManifest diff = manifest.getDiffManifest();
boolean includeEphemeral = true;
List<UploadedIndexMetadata> updatedIndices = diff.getIndicesUpdated().stream().map(idx -> {
Optional<UploadedIndexMetadata> uploadedIndexMetadataOptional = manifest.getIndices()
.stream()
.filter(idx2 -> idx2.getIndexName().equals(idx))
.findFirst();
assert uploadedIndexMetadataOptional.isPresent() == true;
return uploadedIndexMetadataOptional.get();
}).collect(Collectors.toList());
Map<String, UploadedMetadataAttribute> updatedCustomMetadata = new HashMap<>();
if (diff.getCustomMetadataUpdated() != null) {
for (String customType : diff.getCustomMetadataUpdated()) {
updatedCustomMetadata.put(customType, manifest.getCustomMetadataMap().get(customType));
}
}
Map<String, UploadedMetadataAttribute> updatedClusterStateCustom = new HashMap<>();
if (diff.getClusterStateCustomUpdated() != null) {
for (String customType : diff.getClusterStateCustomUpdated()) {
updatedClusterStateCustom.put(customType, manifest.getClusterStateCustomMap().get(customType));
}
}
List<UploadedIndexMetadata> updatedIndexRouting = new ArrayList<>();
if (manifest.getCodecVersion() == CODEC_V2 || manifest.getCodecVersion() == CODEC_V3) {
updatedIndexRouting.addAll(
remoteRoutingTableService.getUpdatedIndexRoutingTableMetadata(
diff.getIndicesRoutingUpdated(),
manifest.getIndicesRouting()
)
);
}
ClusterState updatedClusterState = readClusterStateInParallel(
previousState,
manifest,
manifest.getClusterUUID(),
localNodeId,
updatedIndices,
updatedCustomMetadata,
diff.isCoordinationMetadataUpdated(),
diff.isSettingsMetadataUpdated(),
diff.isTransientSettingsMetadataUpdated(),
diff.isTemplatesMetadataUpdated(),
diff.isDiscoveryNodesUpdated(),
diff.isClusterBlocksUpdated(),
updatedIndexRouting,
diff.isHashesOfConsistentSettingsUpdated(),
updatedClusterStateCustom,
manifest.getDiffManifest() != null
&& manifest.getDiffManifest().getIndicesRoutingDiffPath() != null
&& !manifest.getDiffManifest().getIndicesRoutingDiffPath().isEmpty(),
includeEphemeral
);
ClusterState.Builder clusterStateBuilder = ClusterState.builder(updatedClusterState);
Metadata.Builder metadataBuilder = Metadata.builder(updatedClusterState.metadata());
// remove the deleted indices from the metadata
for (String index : diff.getIndicesDeleted()) {
metadataBuilder.remove(index);
}
// remove the deleted metadata customs from the metadata
if (diff.getCustomMetadataDeleted() != null) {
for (String customType : diff.getCustomMetadataDeleted()) {
metadataBuilder.removeCustom(customType);
}
}
// remove the deleted cluster state customs from the metadata
if (diff.getClusterStateCustomDeleted() != null) {
for (String customType : diff.getClusterStateCustomDeleted()) {
clusterStateBuilder.removeCustom(customType);
}
}
HashMap<String, IndexRoutingTable> indexRoutingTables = new HashMap<>(
updatedClusterState.getRoutingTable().getIndicesRouting()
);
if (manifest.getCodecVersion() == CODEC_V2 || manifest.getCodecVersion() == CODEC_V3) {
for (String indexName : diff.getIndicesRoutingDeleted()) {
indexRoutingTables.remove(indexName);
}
}
ClusterState clusterState = clusterStateBuilder.stateUUID(manifest.getStateUUID())
.version(manifest.getStateVersion())
.metadata(metadataBuilder)
.routingTable(new RoutingTable(manifest.getRoutingTableVersion(), indexRoutingTables))
.build();
if (!remoteClusterStateValidationMode.equals(RemoteClusterStateValidationMode.NONE)
&& manifest.getClusterStateChecksum() != null) {
validateClusterStateFromChecksum(manifest, clusterState, previousState.getClusterName().value(), localNodeId, false);
}
final long durationMillis = TimeValue.nsecToMSec(relativeTimeNanosSupplier.getAsLong() - startTimeNanos);
remoteStateStats.stateDiffDownloadSucceeded();
remoteStateStats.stateDiffDownloadTook(durationMillis);
assert includeEphemeral == true;
// newState includes all the fields of cluster-state (includeEphemeral=true always)
remoteClusterStateCache.putState(clusterState);
return clusterState;
} catch (Exception e) {
logger.error("Failure in downloading diff cluster state. ", e);
remoteStateStats.stateDiffDownloadFailed();
throw e;
}
}

The manifest contains two information which are set by the cluster manager prior to sending it to the nodes:

  1. manifest.indices → list of indices which are newly created or have a version updated in the new cluster state update
  2. diffManifest → contains the overall diff between the new and old cluster state on the master node

While reading the new cluster state, we do the following

  1. Get the indices were updated (includes deleted) from clusterDiffManifest and filter them based on uploadedMetadataResult.uploadedIndexMetadata. This will contain all the indices updated or created in the new cluster state update (Ref)
  2. Read the new & updated IndexMetadata from the list received from (Step 1) and pass it to readClusterStateInParallel (Ref]
  3. As part of the readClusterStateInParallel flow - we first get the new IndexMetadata from Remote and update it in the old metadata and try to build the new cluster state. Note that this is a update only operation and does not delete any index from the previous cluster state

Metadata.Builder metadataBuilder = Metadata.builder(previousState.metadata());

metadataBuilder.indices(indexMetadataMap);
if (readDiscoveryNodes) {
clusterStateBuilder.nodes(discoveryNodesBuilder.get().localNodeId(localNodeId));
}

As part of the IndicesAlias Action we can remove an index and assigns its name as an alias to another index in the same request. For example:

curl -X POST "localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
    "actions": [
        {
            "remove_index": {
                "index": "index-1"
            }
        },
        {
            "add": {
                "index": "index-2",
                "alias": "index-1"
            }
        }
    ]
}'

In this case the new cluster state to be achieved would have the following diff -

  1. Removing an Index Metadata
  2. Updating the IndexMetadata with a new alias (Bumps up the IndexMetadata version as well)

In this case, since we are only updating the metadata with the newest IndexMetadata we fetched from Remote (As mentioned in Step 3 above), we end up having the Metadata in an inconsistent state which leads to the issue.

Q: Why do we only see it in Remote Publication enabled domains?

In the case of a cluster state update by diff application from transport - we do a map diff applying where-in we delete the index metadata to be removed first before performing any update actions

We maintain this in the Metadata in the format of <String, IndexMetadata>

private final Diff<Map<String, IndexMetadata>> indices;

We apply by deleting first -

@Override
public Map<K, T> apply(Map<K, T> map) {
Map<K, T> builder = new HashMap<>(map);
for (K part : deletes) {
builder.remove(part);
}
for (Map.Entry<K, Diff<T>> diff : diffs.entrySet()) {
builder.put(diff.getKey(), diff.getValue().apply(builder.get(diff.getKey())));
}
for (Map.Entry<K, T> upsert : upserts.entrySet()) {
builder.put(upsert.getKey(), upsert.getValue());
}
return builder;
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Cluster Manager
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants