Skip to content

v4 signing inconsistent with version v1.36.3 #3056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
nathani-work opened this issue Apr 10, 2025 · 5 comments
Closed
3 tasks done

v4 signing inconsistent with version v1.36.3 #3056

nathani-work opened this issue Apr 10, 2025 · 5 comments
Labels
bug This issue is a bug. potential-regression Marking this issue as a potential regression to be checked by team member

Comments

@nathani-work
Copy link

Acknowledgements

Describe the bug

When signing requests with a v4 signature, we see inconsistent responses. around 50% of the requests I tested fail, while the rest succeed.
This issue happens in the latest version (at time of testing) v1.36.3, but works as expected in v1.36.0

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

I expect that v4 signed requests work consistently, and that for the same request signed and sent 10 times, I get the same response 10 times.

Current Behavior

With version v1.36.3, around 50% of the requests fail with this kind of response:

Status code 403
Response body:

{\"message\":\"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method. Consult the service documentation for details.\\n\\nThe Canonical String for this request should have been\\n'POST\\n/url/path\\n\\ncontent-length:62\\nhost:example.com\\nx-amz-date:20250401T130206Z\\nx-amz-security-token:***\\ [...] '\\n\\nThe String-to-Sign should have been\\n'[...]'\\n\"}%!(EXTRA string=[...])}

the other 50% of the requests respond as expected. In our case our API Gateway invokes a lambda, and we see the correct response from the lambda.

Reproduction Steps

We use the following code to create a v4 signed request. All imports are reasonably up to date, including the aws-sdk-go-v2 library using v1.36.3. Once we downgrade aws-sdk-go-v2 to v1.36.0, and leave all other imported libraries to their same versions, the issue cannot be reproduced anymore.

import (
	"bytes"
	"context"
	"crypto/sha256"
	"fmt"
	"net/http"
	"time"

	v4 "github.com/aws/aws-sdk-go-v2/aws/signer/v4"
	"github.com/aws/aws-sdk-go-v2/config"
)

func getV4SignedRequest(ctx context.Context, method, url string, headers http.Header, body *string) (*http.Request, error) {
	signer := v4.NewSigner()

	cfg, err := config.LoadDefaultConfig(ctx)
	if err != nil {
		return nil, err
	}

	credentials, err := cfg.Credentials.Retrieve(ctx)
	if err != nil {
		return nil, err
	}

	var httpRequest *http.Request
	var bodyHash [32]byte
	if body == nil {
		httpRequest, err = http.NewRequestWithContext(ctx, method, url, nil)
		if err != nil {
			return nil, err
		}
		bodyHash = sha256.Sum256([]byte(""))
	} else {
		bodyBuffer := bytes.NewBufferString(*body)
		httpRequest, err = http.NewRequestWithContext(ctx, method, url, bodyBuffer)
		if err != nil {
			return nil, err
		}
		bodyHash = sha256.Sum256([]byte(*body))
	}

	httpRequest.Header = headers.Clone()

	err = signer.SignHTTP(ctx, credentials, httpRequest, fmt.Sprintf("%x", bodyHash), "execute-api", "eu-west-1", time.Now())
	if err != nil {
		return nil, err
	}

	return httpRequest, err
}

Possible Solution

No response

Additional Information/Context

While testing to figure out what was the cause of the inconsistency, we tried to add delay in the form of sleeps before sending the requests. We were otherwise sending 10 or so requests one after another. Adding half a second did not seem to have an effect, but adding 1 second or more resulted in all requests succeeding. No idea why that would be.

AWS Go SDK V2 Module Versions Used

github.com/aws/aws-lambda-go v1.47.0
github.com/aws/aws-sdk-go v1.50.9 // indirect
github.com/aws/aws-sdk-go-v2 v1.36.0 // indirect
github.com/aws/aws-sdk-go-v2/config v1.29.5 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.17.58 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.16.27 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.3.31 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.6.31 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.8.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.12.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.12.12 // indirect
github.com/aws/aws-sdk-go-v2/service/kms v1.27.9 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.24.14 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.28.13 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.33.13 // indirect
github.com/aws/aws-xray-sdk-go v1.8.3 // indirect
github.com/aws/smithy-go v1.22.2 // indirect

Compiler and Version used

go 1.22.0

Operating System and version

lambda running Amazon Linux 2

@nathani-work nathani-work added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Apr 10, 2025
@github-actions github-actions bot added the potential-regression Marking this issue as a potential regression to be checked by team member label Apr 10, 2025
@Madrigal
Copy link
Contributor

This is weird. Signing is pretty stable and there are not a lot of changes that we introduce to the package, with the exception of manually block or allow certain headers (last example is #2991)

I see you added this comment

The String-to-Sign should have been\n'[...]'\n"}%!(EXTRA string=[...])}

can you elaborate more on what that extra string you can see?

One thing I can imagine may interfere (but I don't think would be relevant to SDK versions) would be headers that are modified on a hop-by-hop basis. There have been other cases where certain headers cause issues, such as #2594, so knowing which headers you use may be helpful as well

@Madrigal Madrigal added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. and removed needs-triage This issue or PR still needs to be triaged. labels Apr 15, 2025
@nathani-work
Copy link
Author

I've been doing more tests, and I don't think the version of the sdk is relevant anymore, since I've had the issue happen again with the v1.36.0 too.

I see you added this comment

The String-to-Sign should have been\n'[...]'\n"}%!(EXTRA string=[...])}

can you elaborate more on what that extra string you can see?

When the v4 signing fails, we see EXTRA string=. That value always the random UUID we create to identify the request, and it is sent in a header that looks like x-[...]-correlation-id. It's basically the same as all the other headers we send, with the exception that it gets also put on the context. Do you read the values on the context ?

Mostly what I see is that this behaviour comes and goes. I can send tens of requests and none of them will fail, but then a new batch with fail. It seems that when a request succeeds, all subsequent requests also succeed.

So I might run a test batch of 10 requests and they might all succeed. While I might run another batch of 10 requests, and the 2 or 3 first requests fail, but all the subsequent ones succeed.

The IAM Role we use for that test is also used for other v4 signing tests that never fail on a different endpoint, and the endpoint we have trouble with is used by a client who v4 signs their requests using the python sdk, and never has issues either.

It seems like the more I test, the less the issue happens, but it never fully goes away. I wrote it used to fail 50% of the time, while now I have to run many tests before one will fail. So it's also difficult to see if a change has any impact

@Madrigal
Copy link
Contributor

When the v4 signing fails, we see EXTRA string=. That value always the random UUID we create to identify the request, and it is sent in a header

so most of the time, when we see this kind of errors is due to inconsistent headers. For example, retries that modify the header, proxies that modify the header, that sort of thing.

The general guidance we give is to sign x-amz- headers, content-type and Host headers, and only add other headers if you need to. See canonical headers section at sigv4 guide.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Apr 17, 2025
@nathani-work
Copy link
Author

for now we "fixed" the issue by retrying after 2 seconds if the v4 signing fails. the retried call has the exact same headers given to it, but the request is recreated and resigned.

we'll probably try to hunt down the issue further, maybe try to sign the request ourselves as per the guide you sent, or add some debugging to our current signing code.

i'll close this issue as the more I look into it the more it seems the issue is some inconsistency somewhere in our code or infrastructure, rather than in the codebase here. thanks for the answers !

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. potential-regression Marking this issue as a potential regression to be checked by team member
Projects
None yet
Development

No branches or pull requests

2 participants