Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The issue where SDKv2 is not able to upload large files to an S3 bucket, even after adding configuration #5966

Open
1 task
Nikkeii opened this issue Mar 18, 2025 · 2 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.

Comments

@Nikkeii
Copy link

Nikkeii commented Mar 18, 2025

Describe the bug

The Scala application is designed to upload large files (including directories with subdirectories) to Amazon S3 using the AWS SDK v2 S3AsyncClient and S3TransferManager. However, despite enabling multipart upload and configuring the MultipartConfiguration correctly, the application fails to upload large files consistently. The upload process results in a timeout without completing the upload.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

Uploads files to s3 successfully.

Current Behavior

Observed Errors:

  1. Despite increasing apiCallTimeout, connectionTimeout, and writeTimeout to 60+ minutes, the uploads still time out when attempting to upload large files.
  2. Multipart Configuration Ignored: The multipart upload configuration is not being respected by S3AsyncClient.
    Threshold and part size settings do not seem to trigger splitting large files into multiple parts.
  3. The difference between the request time and the current time is too large.
  4. ERROR - Failed to upload directory [file] to bucket [bucket]. Error message: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: Acquire operation took longer than the configured maximum time. This indicates that a request cannot get a connection from the pool within the specified maximum time. This can be due to high request rate.

Reproduction Steps

Try to upload around 19-20GB file with this script
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.{RetryMode, RetryPolicy}
import software.amazon.awssdk.core.retry.backoff.FullJitterBackoffStrategy
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.{S3AsyncClient, S3Client}
import software.amazon.awssdk.services.s3.crt.{S3CrtHttpConfiguration, S3CrtRetryConfiguration}
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}
import software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener

import scala.jdk.CollectionConverters.*
import java.io.File
import java.nio.file.{Files, Path}
import java.time.Duration

object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)

def main(args: Array[String]): Unit = {
val bucketName = "bucket_name"
val keyPrefix = "obj_key"
val dirPath = "file"
val includeSubDir = true
val MB = 1024 * 1024

logger.info(s"Starting S3 upload from directory: $dirPath")

val httpClient = NettyNioAsyncHttpClient.builder()
  .maxConcurrency(20) // Increase max connections
  .connectionAcquisitionTimeout(Duration.ofMinutes(60))
  .readTimeout(Duration.ofMinutes(60)) // Increase read timeout
  .writeTimeout(Duration.ofMinutes(70)) // Increase write timeout
  .tcpKeepAlive(true)
  .connectionTimeout(Duration.ofMinutes(70)) // Increase connection timeout

val overrideConfig = ClientOverrideConfiguration.builder()
  .retryStrategy(RetryMode.STANDARD) //  Uses AWS SDK standard retry strategy
  .apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
  .apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
  .build()


val s3AsyncClient = S3AsyncClient.builder()
  .region(Region.US_EAST_1) // Set your AWS region
  .credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
  .overrideConfiguration(overrideConfig)
  .httpClientBuilder(httpClient)
  .multipartEnabled(true)
  .multipartConfiguration(MultipartConfiguration.builder()
  .thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
  .build()




val dir = new File(dirPath)
if (!dir.exists() || !dir.isDirectory) {
  logger.error(s"Directory does not exist or is not a directory: $dirPath")
  return
}

val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()

try {
  val filesToUpload: List[Path] = if (includeSubDir) {
    Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
  } else {
    Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
  }

  if (filesToUpload.isEmpty) {
    logger.warn("No files found to upload.")
    return
  }

  filesToUpload.foreach { filePath =>
    val relativePath = dir.toPath.relativize(filePath).toString.replace("\\", "/")
    val s3Key = keyPrefix + relativePath

    logger.info(s"Uploading file: ${filePath.toAbsolutePath} -> S3 ($bucketName/$s3Key)")

    val uploadFileRequest = UploadFileRequest.builder()
      .source(filePath)
      .addTransferListener(LoggingTransferListener.create())  // Add listener.
      .putObjectRequest(PutObjectRequest.builder()
        .bucket(bucketName)
        .key(s3Key)
        .serverSideEncryption(ServerSideEncryption.AES256)
        .build())
      .build()

    val upload = transferManager.uploadFile(uploadFileRequest)
    val result: CompletedFileUpload = upload.completionFuture().join()

    logger.info(s"Successfully uploaded: $s3Key")
  }

} catch {
  case e: Exception =>
    logger.error("Error during upload", e)
} finally {
  transferManager.close()
}

}
}

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.30.38

JDK version used

21

Operating System and version

Windows 11

@Nikkeii Nikkeii added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 18, 2025
@Nikkeii
Copy link
Author

Nikkeii commented Mar 24, 2025

Any updates?

@bhoradc
Copy link

bhoradc commented Apr 4, 2025

Hello @Nikkeii,

Thank you for providing the detailed information about the timeout issue you're experiencing during large file uploads to Amazon S3.

After attempting to reproduce the issue locally using the same code and configurations, including the reported 2.30.38 as well as recent 2.31.14 SDK version, I was unable to encounter the timeout problems you're facing. The application ran successfully.

main.scala
package org.example

import Main.getClass
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.RetryMode
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.S3AsyncClient
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}

import scala.jdk.CollectionConverters.*
import java.io.File
import java.lang
import java.nio.file.{Files, Path}
import java.time.Duration
import scala.sys.props

object Main {
  private val logger: Logger = LoggerFactory.getLogger(getClass)

  def main(args: Array[String]): Unit = {
    val bucketName = "scala**"
    val keyPrefix = "testfile"
    val dirPath = "/Users/***/Downloads/samplelarge/GBfolder"
    val includeSubDir = true
    val MB = 1024 * 1024

    val runtime = Runtime.getRuntime
    println(
      s"""Memory (MB):
         |Max: ${runtime.maxMemory() / 1024 / 1024}
         |Total: ${runtime.totalMemory() / 1024 / 1024}
         |Free: ${runtime.freeMemory() / 1024 / 1024}
         |Used: ${(runtime.totalMemory() - runtime.freeMemory()) / 1024 / 1024}
         |""".stripMargin)

    logger.info(s"Starting S3 upload from directory: $dirPath")

    val httpClient = NettyNioAsyncHttpClient.builder()
      .maxConcurrency(20) // Increase max connections
      .connectionAcquisitionTimeout(Duration.ofMinutes(60))
      .readTimeout(Duration.ofMinutes(60)) // Increase read timeout
      .writeTimeout(Duration.ofMinutes(70)) // Increase write timeout
      .tcpKeepAlive(true)
      .connectionTimeout(Duration.ofMinutes(70)) // Increase connection timeout

    val overrideConfig = ClientOverrideConfiguration.builder()
      .retryStrategy(RetryMode.STANDARD) //  Uses AWS SDK standard retry strategy
      .apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
      .apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
      .build()

    val s3AsyncClient = S3AsyncClient.builder()
      .region(Region.US_EAST_1) // Set your AWS region
      .credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
      .overrideConfiguration(overrideConfig)
      .httpClientBuilder(httpClient)
      .multipartEnabled(true)
      .multipartConfiguration(MultipartConfiguration.builder()
      .thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
      .build()

    val dir = new File(dirPath)
    if (!dir.exists() || !dir.isDirectory) {
      logger.error(s"Directory does not exist or is not a directory: $dirPath")
      return
    }

    val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()

    try {
      val filesToUpload: List[Path] = if (includeSubDir) {
        Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
      } else {
        Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
      }

      if (filesToUpload.isEmpty) {
        logger.warn("No files found to upload.")
        return
      }

      filesToUpload.foreach { filePath =>
        val relativePath = dir.toPath.relativize(filePath).toString.replace("\\", "/")
        val s3Key = keyPrefix + relativePath

        logger.info(s"Uploading file: ${filePath.toAbsolutePath} -> S3 ($bucketName/$s3Key)")

        val uploadFileRequest = UploadFileRequest.builder()
          .source(filePath)
          .putObjectRequest(PutObjectRequest.builder()
            .bucket(bucketName)
            .key(s3Key)
            .serverSideEncryption(ServerSideEncryption.AES256)
            .build())
          .build()

        val upload = transferManager.uploadFile(uploadFileRequest)
        val result: CompletedFileUpload = upload.completionFuture().join()

        logger.info(s"Successfully uploaded: $s3Key")
      }

    } catch {
      case e: Exception =>
        logger.error("Error during upload", e)
    } finally {
      transferManager.close()
    }
  }
}
build.sbt
name := "Scala_TM_AcquireTimeout"
version := "1.0"
scalaVersion := "3.3.1"

libraryDependencies ++= Seq(
  "software.amazon.awssdk" % "bom" % "2.31.14" pomOnly(),
  "software.amazon.awssdk" % "s3" % "2.31.14",
  "software.amazon.awssdk" % "s3-transfer-manager" % "2.31.14",
  "org.slf4j" % "slf4j-api" % "2.0.16",
  "ch.qos.logback" % "logback-classic" % "1.5.11"
)

resolvers += "AWS Release Repository" at "https://maven.amazonaws.org"

javacOptions ++= Seq(
  "-source", "11",
  "-target", "11"
)

scalacOptions ++= Seq(
  "-deprecation",
  "-feature",
  "-unchecked"
)

fork := true

// Add JVM options if needed
//javaOptions ++= Seq(
//  "-Xms200m",
//  "-Xmx200m"
//)

We also have an AWS support ticket for this issue, where I mentioned that this appears to be related to your execution environment or network conditions, rather than a problem with the SDK itself.

The error message Acquire operation took longer than the configured maximum time suggests that the SDK is unable to acquire a connection from the connection pool within the specified timeout. It's worth noting that the execution environment, network conditions, and resource availability can vary significantly between different setups, which may explain why I was unable to replicate the issue locally.

The time synchronization error as well could stem from network issues that interfere with the client's ability to accurately synchronize its system clock with a reliable time source.

Therefore, I would recommend continuing to work closely with the AWS Support team, as they may have access to additional tools and resources that can help diagnose and resolve environment-specific network issues.

Regards,
Chaitanya

@bhoradc bhoradc added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Apr 4, 2025
@bhoradc bhoradc self-assigned this Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days.
Projects
None yet
Development

No branches or pull requests

2 participants