S3 `Downloader` potentially downloads corrupted object when the it is split into multiple parts #4986

kevinjqiu · 2023-09-12T19:29:50Z

Describe the bug

We have a system that's constantly updating an object stored in an S3 bucket
The object is downloaded by multiple clients concurrently
In our particular case, the stored object is gzipped but the issue applies to all sorts of files
When the object grows to over 5MB (which causes the downloader to download the object in multiple parts), we encountered an issue where the second part is downloaded from a different version of the object, therefore corrupting the download

Expected Behavior

The latest version of the object is downloaded

Current Behavior

An error is encountered from time to time, e.g.,

2023/09/12 14:54:39
panic: flate: corrupt input before offset 1089400

goroutine 1 [running]:
main.fetch(0x14000146640)
        /Users/kevinqiu/src/tmp/s3bugrepro/main.go:36 +0x2a0
main.main()
        /Users/kevinqiu/src/tmp/s3bugrepro/main.go:57 +0xd0
exit status 2

or

panic: gzip: invalid checksum

With logging turned on, it's observed that when a later part is being downloaded and when the object is updated before that, the later chunk from a different version is downloaded and therefore corrupting the output.

Reproduction Steps

Minimally reproducible example:

Producer

The producer is simply a script that uploads a gzipped file (greater than 5MB) to a bucket constantly

#! /bin/bash
while true; do
    files="0 1 2 3"
    for i in $files; do
        aws s3 cp $i s3://$BUCKET/TEST
        sleep 1
    done
done

Consumer

package main

import (
	"bytes"
	"compress/gzip"
	"fmt"
	"io"
	"time"

	"github.com/aws/aws-sdk-go/aws"
	"github.com/aws/aws-sdk-go/aws/session"
	"github.com/aws/aws-sdk-go/service/s3"
	"github.com/aws/aws-sdk-go/service/s3/s3manager"
)

func fetch(downloader *s3manager.Downloader) {
	buff := &aws.WriteAtBuffer{}
	goi := &s3.GetObjectInput{
		Bucket: aws.String("BUCKET"),  // replace with the real bucket name
		Key:    aws.String("TEST"),
	}
	_, err := downloader.Download(buff, goi)
	if err != nil {
		panic(err)
	}

	b := buff.Bytes()
	br := bytes.NewReader(b)
	gzr, err := gzip.NewReader(br)
	if err != nil {
		panic(err)
	}

	uzb, err := io.ReadAll(gzr)
	if err != nil && err != io.EOF {
		panic(err)
	}

	fmt.Printf("Size=%v\n", len(uzb))
}

func main() {
	s, err := session.NewSession(&aws.Config{
		Region:   aws.String("us-east-1"),
		LogLevel: aws.LogLevel(aws.LogDebugWithHTTPBody),
	})

	if err != nil {
		panic(err)
	}

	downloader := s3manager.NewDownloader(s, func(downloader *s3manager.Downloader) {
		downloader.PartSize = 1024 * 512  // set to a small chunk size so the problem can be reproduced sooner
	})

	for {
		fetch(downloader)
		time.Sleep(2 * time.Second)
	}
}

Possible Solution

When GetObjectInput.versionId is not provided by the user (which means getting the latest object version), send a request to first figure out the latest version of the object, and then set the versionId in the subsequent downloadChunk method.

Additional Information/Context

No response

SDK version used

1.44

Environment details (Version of Go (`go version`)? OS name and version, etc.)

1.20

The text was updated successfully, but these errors were encountered:

RanVaknin · 2023-09-29T20:09:01Z

Hi @kevinjqiu,

Thanks for reaching out. What you are describing is not a bug with the SDK, but rather a data intensive application concept called "read skew" and one solution for that is exactly what you described here:

When GetObjectInput.versionId is not provided by the user (which means getting the latest object version), send a request to first figure out the latest version of the object, and then set the versionId in the subsequent downloadChunk method.

The SDK team cannot cover this use case in our current implementation mainly because adding an extra API call would come with an additional cost both in terms of $ value, and performance hit, and would not accomodate the majority of customers. Not to mention it might not be backwards compatible.

The solution to this is implementing this mechanism on the application level. You need to implement you own version checking ("optimistic locking") mechanism to make sure you have the appropriate version before you run GetObject. The GetObjectInput interface takes in a VersionId that makes this easier:

latestObjectVersion := getLatestVersion("BUCKET", "TEST") // you need to implement this function.

goi := &s3.GetObjectInput{
    Bucket:    aws.String("BUCKET"),
    Key:       aws.String("TEST"),
    VersionId: aws.String(latestObjectVersion),
}

I'm not sure what is the business case for your application, but if the above approach proves to be too difficult, you might want to look into event driven approach with SQS for example.
instead of consumers independently polling S3 for the latest version of the object, they would be notified when a new version is available and then proceed to download it.

I understand that this must be frustrating to read, but as it is, its not really actionable by the SDK team so Im inclined to close this.
If you have any additional questions, please consider opening a discussion in the discussion tab on our repo.

Thanks again,
Ran~

github-actions · 2023-09-29T20:09:23Z

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

kevinjqiu · 2023-09-29T20:21:00Z

@RanVaknin Thanks for the reply. I think at the very least, the risk of read skew should be called out in the documentation.

From the user's perspective, we call downloader.Download() to get the latest version of the object. We don't really know (or should we know) that the object is split and stored in multiple chunks and there's a possibility where different version of the chunks are downloaded if the underlying object has been updated in the process. The API should abstract this away but I understand the concerns you laid out, so I think at least the documentation should be updated to let the user know the risk exists and consider the mitigation on the client side. In our case, we found out the hard way.

RanVaknin · 2023-09-29T20:59:35Z

Hi @kevinjqiu

Thanks for the follow up.

We don't really know (or should we know) that the object is split and stored in multiple chunks

You are using the Downloader package which is the multipart download utility for the SDK.
If you do not need to download the object in parts you can use S3.GetObject directly.

In terms of the documenting this, this is not an edge case unique to the SDK. Read Skews can happen in many many frameworks, and since you are designing a data intensive application, you need to be aware of the various strategies to mitigate their inherent caveats and risks. The same way we don't document that you could run into issues with idempotent writes when interacting with an RDS table, we don't document general programming best practices. These are concepts that apply at an infrastructural level and are not unique to client-side tools like the SDK

I appreciate your input and understand the concerns you've raised. Our goal is to keep the SDK documentation focused on its core functionality while providing clear guidance. If there are specific gaps or ambiguities you've noticed in relation to the SDK itself, feel free to open a separate documentation issue with actionable feedback.

Thanks again,
Ran~

rittneje · 2023-10-07T15:57:02Z

@RanVaknin To be frank, I find it wholly unacceptable that AWS's stance here is to do absolutely nothing to help its customers. Unless you explicitly happen to test this race condition, and get the timing right, you won't notice until you already have an issue in production.

At the very least, the SDK MUST return an error if it detects that two chunks came from different versions during a multipart download.

github-actions · 2025-04-22T20:39:15Z

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

bkaws · 2025-04-24T18:49:46Z

Thank you for your candid feedback. We have re-evaluated our previous position on this issue and agree that the better customer experience is to address it. We implemented a change in the S3 transfer manager client for the Go v2 SDK that addresses this issue, which we backported to the Go v1 SDK.

Starting with version v1.55.7, Downloader will now perform an ETag check from the initial GET response to confirm object parts are downloaded from the same version. If the object is modified during the multi-part download and no VersionId is provided in the request, the download will fail. Alternatively, users can still multipart download a specific version of a large object by passing VersionId in the request input. Upgrading to the latest version will resolve your issue.

As a reminder, this SDK is in maintenance mode and we're only releasing critical changes, so we encourage you to migrate to v2 if you can, but we continue to monitor this repository for comments and issues.

kevinjqiu added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 12, 2023

kevinjqiu changed the title ~~S3 Downloader potentially downloads corrupted object when the object is downloaded with multiple parts~~ S3 Downloader potentially downloads corrupted object when the object is split into multiple parts Sep 12, 2023

kevinjqiu changed the title ~~S3 Downloader potentially downloads corrupted object when the object is split into multiple parts~~ S3 Downloader potentially downloads corrupted object when the it is split into multiple parts Sep 12, 2023

RanVaknin self-assigned this Sep 25, 2023

RanVaknin removed the needs-triage This issue or PR still needs to be triaged. label Sep 29, 2023

RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Sep 29, 2023

alopezz mentioned this issue May 6, 2024

Catch cache cloning errors and continue DataDog/datadog-agent#25381

Merged

This was referenced Apr 3, 2025

Feat Getobject Version Control aws/aws-sdk-go-v2#3050

Merged

Feat GetObject Version Control #5345

Merged

wty-Bryant closed this as completed in #5345 Apr 22, 2025

lucix-aws unassigned RanVaknin Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 `Downloader` potentially downloads corrupted object when the it is split into multiple parts #4986

S3 `Downloader` potentially downloads corrupted object when the it is split into multiple parts #4986

kevinjqiu commented Sep 12, 2023

RanVaknin commented Sep 29, 2023

github-actions bot commented Sep 29, 2023

kevinjqiu commented Sep 29, 2023

RanVaknin commented Sep 29, 2023 •

edited

Loading

rittneje commented Oct 7, 2023

github-actions bot commented Apr 22, 2025

bkaws commented Apr 24, 2025

S3 Downloader potentially downloads corrupted object when the it is split into multiple parts #4986

S3 Downloader potentially downloads corrupted object when the it is split into multiple parts #4986

Comments

kevinjqiu commented Sep 12, 2023

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Producer

Consumer

Possible Solution

Additional Information/Context

SDK version used

Environment details (Version of Go (go version)? OS name and version, etc.)

RanVaknin commented Sep 29, 2023

github-actions bot commented Sep 29, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

kevinjqiu commented Sep 29, 2023

RanVaknin commented Sep 29, 2023 • edited Loading

rittneje commented Oct 7, 2023

github-actions bot commented Apr 22, 2025

bkaws commented Apr 24, 2025

S3 `Downloader` potentially downloads corrupted object when the it is split into multiple parts #4986

S3 `Downloader` potentially downloads corrupted object when the it is split into multiple parts #4986

Environment details (Version of Go (`go version`)? OS name and version, etc.)

RanVaknin commented Sep 29, 2023 •

edited

Loading