[Feature Request] Remote Translog Optimisations for faster and resilient recoveries

### Is your feature request related to a problem? Please describe

1. If the recoveries have large amount of uncommitted operations, they timeout and the translogs have to be download afresh causing a loop of failed recoveries each timing out as there are no incremental downloads on retries
2. The remote translog recovery acquires a shard lock while downloading translog files. Now if the recovery fails the shard cannot be closed until the recovery of translog completes which causes the cluster applier thread to get blocked, causing nodes to lag the cluster state and ultimately drop out of the cluster

```
"opensearch[691aeed35826ecc93653e3011d18c9b1][clusterApplierService#updateTask][T#1]" #268 daemon prio=5 os_prio=0 cpu=69394.87ms elapsed=10325.08s tid=0x0000ffdde862cd40 nid=0x487c waiting for monitor entry  [0x0000ffdc2f4fd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:265)
        - waiting to lock <0x0000ffe0683bbfa8> (a org.opensearch.indices.cluster.IndicesClusterStateService)
        at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:608)
        at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:595)
        at org.opensearch.cluster.service.ClusterApplierService.applyChanges(ClusterApplierService.java:563)
        at org.opensearch.cluster.service.ClusterApplierService.runTask(ClusterApplierService.java:486)
        at org.opensearch.cluster.service.ClusterApplierService$UpdateTask.run(ClusterApplierService.java:188)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedOpenSearchThreadPoolExecutor.java:283)
        at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSearchThreadPoolExecutor.java:246)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.9/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.9/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(java.base@17.0.9/Thread.java:840)
        
   Locked ownable synchronizers:
        - <0x0000ffe06a6cba78> (a java.util.concurrent.ThreadPoolExecutor$Worker)
```

```
"opensearch[691aeed35826ecc93653e3011d18c9b1][generic][T#26]" #290 daemon prio=5 os_prio=0 cpu=464.40ms elapsed=10325.07s tid=0x0000ffdc90029390 nid=0x4892 waiting for monitor entry  [0x0000ffdc2defd000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at org.opensearch.index.shard.IndexShard.close(IndexShard.java:2110)
        - waiting to lock <0x0000ffe1449c1d98> (a java.lang.Object)
        at org.opensearch.index.IndexService.closeShard(IndexService.java:644)
        at org.opensearch.index.IndexService.removeShard(IndexService.java:620)
        - locked <0x0000ffe0931fbd80> (a org.opensearch.index.IndexService)
        at org.opensearch.indices.cluster.IndicesClusterStateService.failAndRemoveShard(IndicesClusterStateService.java:817)
        at org.opensearch.indices.cluster.IndicesClusterStateService.handleRecoveryFailure(IndicesClusterStateService.java:797)
        - locked <0x0000ffe0683bbfa8> (a org.opensearch.indices.cluster.IndicesClusterStateService)
        at org.opensearch.indices.recovery.RecoveryListener.onFailure(RecoveryListener.java:55)
        at org.opensearch.indices.recovery.RecoveryTarget.notifyListener(RecoveryTarget.java:136)
        at org.opensearch.indices.replication.common.ReplicationTarget.fail(ReplicationTarget.java:180)
        at org.opensearch.indices.replication.common.ReplicationCollection.fail(ReplicationCollection.java:212)
        at org.opensearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.onException(PeerRecoveryTargetService.java:756)
        at org.opensearch.indices.recovery.PeerRecoveryTargetService$RecoveryResponseHandler.handleException(PeerRecoveryTargetService.java:682)
        at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:430)
        at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1515)
        at org.opensearch.transport.InboundHandler.lambda$handleException$5(InboundHandler.java:447)
        at org.opensearch.transport.InboundHandler$$Lambda$8371/0x000000a00227e220.run(Unknown Source)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.9/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.9/ThreadPoolExecutor.java:635)
        at java.lang.Thread.run(java.base@17.0.9/Thread.java:840)
        
   Locked ownable synchronizers:
        - <0x0000ffe06a605900> (a java.util.concurrent.ThreadPoolExecutor$Worker)
```

```
"opensearch[691aeed35826ecc93653e3011d18c9b1][generic][T#19]" #283 daemon prio=5 os_prio=0 cpu=187528.43ms elapsed=10325.07s tid=0x0000ffdc90021ff0 nid=0x488b waiting on condition  [0x0000ffdc2e5fd000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@17.0.9/Native Method)
        - parking to wait for  <0x0000fffa222465d0> (a java.util.concurrent.FutureTask)
        at java.util.concurrent.locks.LockSupport.park(java.base@17.0.9/LockSupport.java:211)
        at java.util.concurrent.FutureTask.awaitDone(java.base@17.0.9/FutureTask.java:447)
        at java.util.concurrent.FutureTask.get(java.base@17.0.9/FutureTask.java:190)
        at org.opensearch.encryption.frame.CryptoInputStream.read(CryptoInputStream.java:193)
        at java.io.InputStream.transferTo(java.base@17.0.9/InputStream.java:782)
        at java.nio.file.Files.copy(java.base@17.0.9/Files.java:3171)
        at org.opensearch.index.translog.transfer.TranslogTransferManager.downloadToFS(TranslogTransferManager.java:312)
        at org.opensearch.index.translog.transfer.TranslogTransferManager.downloadTranslog(TranslogTransferManager.java:258)
        at org.opensearch.index.translog.RemoteFsTranslog.downloadOnce(RemoteFsTranslog.java:246)
        at org.opensearch.index.translog.RemoteFsTranslog.download(RemoteFsTranslog.java:213)
        at org.opensearch.index.translog.RemoteFsTranslog.download(RemoteFsTranslog.java:196)
        at org.opensearch.index.shard.IndexShard.syncTranslogFilesFromRemoteTranslog(IndexShard.java:5000)
        at org.opensearch.index.shard.IndexShard.syncRemoteTranslogAndUpdateGlobalCheckpoint(IndexShard.java:4978)
        at org.opensearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2584)
        - locked <0x0000ffe1449c1d98> (a java.lang.Object)
        at org.opensearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:2554)
```

### Describe the solution you'd like

1. Makes translog downloads on recovery incremental
2. Make translog downloads on recovery cancellable
3. Parallelise downloads and and translog replays
4. Attempt & trigger flush on recovery failures

### Related component

Storage:Remote

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Remote Translog Optimisations for faster and resilient recoveries #15277

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Remote Translog Optimisations for faster and resilient recoveries #15277

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions