-
Notifications
You must be signed in to change notification settings - Fork 889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The issue where SDKv2 is not able to upload large files to an S3 bucket, even after adding configuration #5966
Comments
Any updates? |
Hello @Nikkeii, Thank you for providing the detailed information about the timeout issue you're experiencing during large file uploads to Amazon S3. After attempting to reproduce the issue locally using the same code and configurations, including the reported main.scalapackage org.example
import Main.getClass
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.RetryMode
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.S3AsyncClient
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}
import scala.jdk.CollectionConverters.*
import java.io.File
import java.lang
import java.nio.file.{Files, Path}
import java.time.Duration
import scala.sys.props
object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)
def main(args: Array[String]): Unit = {
val bucketName = "scala**"
val keyPrefix = "testfile"
val dirPath = "/Users/***/Downloads/samplelarge/GBfolder"
val includeSubDir = true
val MB = 1024 * 1024
val runtime = Runtime.getRuntime
println(
s"""Memory (MB):
|Max: ${runtime.maxMemory() / 1024 / 1024}
|Total: ${runtime.totalMemory() / 1024 / 1024}
|Free: ${runtime.freeMemory() / 1024 / 1024}
|Used: ${(runtime.totalMemory() - runtime.freeMemory()) / 1024 / 1024}
|""".stripMargin)
logger.info(s"Starting S3 upload from directory: $dirPath")
val httpClient = NettyNioAsyncHttpClient.builder()
.maxConcurrency(20) // Increase max connections
.connectionAcquisitionTimeout(Duration.ofMinutes(60))
.readTimeout(Duration.ofMinutes(60)) // Increase read timeout
.writeTimeout(Duration.ofMinutes(70)) // Increase write timeout
.tcpKeepAlive(true)
.connectionTimeout(Duration.ofMinutes(70)) // Increase connection timeout
val overrideConfig = ClientOverrideConfiguration.builder()
.retryStrategy(RetryMode.STANDARD) // Uses AWS SDK standard retry strategy
.apiCallTimeout(Duration.ofMinutes(30)) // Timeout for API calls
.apiCallAttemptTimeout(Duration.ofMinutes(30)) // Timeout per retry attempt
.build()
val s3AsyncClient = S3AsyncClient.builder()
.region(Region.US_EAST_1) // Set your AWS region
.credentialsProvider(ProfileCredentialsProvider.create("default")) // Set your AWS profile
.overrideConfiguration(overrideConfig)
.httpClientBuilder(httpClient)
.multipartEnabled(true)
.multipartConfiguration(MultipartConfiguration.builder()
.thresholdInBytes(50 * MB).minimumPartSizeInBytes(50 * MB).apiCallBufferSizeInBytes(50 * MB).build())
.build()
val dir = new File(dirPath)
if (!dir.exists() || !dir.isDirectory) {
logger.error(s"Directory does not exist or is not a directory: $dirPath")
return
}
val transferManager = S3TransferManager.builder().s3Client(s3AsyncClient).build()
try {
val filesToUpload: List[Path] = if (includeSubDir) {
Files.walk(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
} else {
Files.list(dir.toPath).iterator().asScala.filter(Files.isRegularFile(_)).toList
}
if (filesToUpload.isEmpty) {
logger.warn("No files found to upload.")
return
}
filesToUpload.foreach { filePath =>
val relativePath = dir.toPath.relativize(filePath).toString.replace("\\", "/")
val s3Key = keyPrefix + relativePath
logger.info(s"Uploading file: ${filePath.toAbsolutePath} -> S3 ($bucketName/$s3Key)")
val uploadFileRequest = UploadFileRequest.builder()
.source(filePath)
.putObjectRequest(PutObjectRequest.builder()
.bucket(bucketName)
.key(s3Key)
.serverSideEncryption(ServerSideEncryption.AES256)
.build())
.build()
val upload = transferManager.uploadFile(uploadFileRequest)
val result: CompletedFileUpload = upload.completionFuture().join()
logger.info(s"Successfully uploaded: $s3Key")
}
} catch {
case e: Exception =>
logger.error("Error during upload", e)
} finally {
transferManager.close()
}
}
} build.sbtname := "Scala_TM_AcquireTimeout"
version := "1.0"
scalaVersion := "3.3.1"
libraryDependencies ++= Seq(
"software.amazon.awssdk" % "bom" % "2.31.14" pomOnly(),
"software.amazon.awssdk" % "s3" % "2.31.14",
"software.amazon.awssdk" % "s3-transfer-manager" % "2.31.14",
"org.slf4j" % "slf4j-api" % "2.0.16",
"ch.qos.logback" % "logback-classic" % "1.5.11"
)
resolvers += "AWS Release Repository" at "https://maven.amazonaws.org"
javacOptions ++= Seq(
"-source", "11",
"-target", "11"
)
scalacOptions ++= Seq(
"-deprecation",
"-feature",
"-unchecked"
)
fork := true
// Add JVM options if needed
//javaOptions ++= Seq(
// "-Xms200m",
// "-Xmx200m"
//) We also have an AWS support ticket for this issue, where I mentioned that this appears to be related to your execution environment or network conditions, rather than a problem with the SDK itself. The error message The time synchronization error as well could stem from network issues that interfere with the client's ability to accurately synchronize its system clock with a reliable time source. Therefore, I would recommend continuing to work closely with the AWS Support team, as they may have access to additional tools and resources that can help diagnose and resolve environment-specific network issues. Regards, |
Describe the bug
The Scala application is designed to upload large files (including directories with subdirectories) to Amazon S3 using the AWS SDK v2 S3AsyncClient and S3TransferManager. However, despite enabling multipart upload and configuring the MultipartConfiguration correctly, the application fails to upload large files consistently. The upload process results in a timeout without completing the upload.
Regression Issue
Expected Behavior
Uploads files to s3 successfully.
Current Behavior
Observed Errors:
Threshold and part size settings do not seem to trigger splitting large files into multiple parts.
Reproduction Steps
Try to upload around 19-20GB file with this script
import org.slf4j.{Logger, LoggerFactory}
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration
import software.amazon.awssdk.core.retry.{RetryMode, RetryPolicy}
import software.amazon.awssdk.core.retry.backoff.FullJitterBackoffStrategy
import software.amazon.awssdk.http.nio.netty.NettyNioAsyncHttpClient
import software.amazon.awssdk.regions.Region
import software.amazon.awssdk.services.s3.{S3AsyncClient, S3Client}
import software.amazon.awssdk.services.s3.crt.{S3CrtHttpConfiguration, S3CrtRetryConfiguration}
import software.amazon.awssdk.services.s3.model.{PutObjectRequest, ServerSideEncryption}
import software.amazon.awssdk.services.s3.multipart.MultipartConfiguration
import software.amazon.awssdk.transfer.s3.S3TransferManager
import software.amazon.awssdk.transfer.s3.model.{CompletedFileUpload, UploadFileRequest}
import software.amazon.awssdk.transfer.s3.progress.LoggingTransferListener
import scala.jdk.CollectionConverters.*
import java.io.File
import java.nio.file.{Files, Path}
import java.time.Duration
object Main {
private val logger: Logger = LoggerFactory.getLogger(getClass)
def main(args: Array[String]): Unit = {
val bucketName = "bucket_name"
val keyPrefix = "obj_key"
val dirPath = "file"
val includeSubDir = true
val MB = 1024 * 1024
}
}
Possible Solution
No response
Additional Information/Context
No response
AWS Java SDK version used
2.30.38
JDK version used
21
Operating System and version
Windows 11
The text was updated successfully, but these errors were encountered: