Skip to content

Mounted cloud credentials should not be world-readable #8919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 18, 2025

Conversation

sseago
Copy link
Collaborator

@sseago sseago commented May 8, 2025

Thank you for contributing to Velero!

Please add a summary of your change

Cloud credentials (both default and BSL-named) should not be world-readable. This PR makes the file access permissions more restrictive.

sh-5.1$ ls -l /tmp/credentials/openshift-adp 
total 4
-rw-r-----. 1 1000740000 root 112 May  8 16:36 cloud-credentials-cloud

Does your change fix a particular issue?

Fixes #8906

Please indicate you've done the following:

Copy link
Collaborator

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  Error: pkg/install/daemonset.go:183:1: File is not properly formatted (gofmt)
  						SecretName: "cloud-credentials",
  ^

Please fix linters.

Copy link

codecov bot commented May 8, 2025

Codecov Report

Attention: Patch coverage is 36.36364% with 28 lines in your changes missing coverage. Please review.

Project coverage is 60.02%. Comparing base (5b29a87) to head (80b829b).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...g/controller/backup_storage_location_controller.go 22.22% 25 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8919      +/-   ##
==========================================
- Coverage   60.05%   60.02%   -0.04%     
==========================================
  Files         378      378              
  Lines       42883    42940      +57     
==========================================
+ Hits        25754    25775      +21     
- Misses      15582    15615      +33     
- Partials     1547     1550       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sseago sseago force-pushed the secret-perms branch 2 times, most recently from b372610 to de248e8 Compare July 8, 2025 13:45
@reasonerjt
Copy link
Contributor

The changes LGTM, could you please check the failures in git actions?

@kaovilai
Copy link
Collaborator

Velero test on include resources from the cluster restore [It] Run E2E test case [ResourceFiltering, IncludeResources, Restore]
/home/runner/work/velero/velero/test/e2e/test/test.go:94

  [FAILED] Failed to backup resources
  Expected success, but got an error:
      <*errors.fundamental | 0xc000a5acc0>: 
      Unexpected backup phase got FailedValidation, expecting Completed
      {
          msg: "Unexpected backup phase got FailedValidation, expecting Completed",
          stack: [0x201471d, 0x20167c6, 0x26b7a19, 0x8e626c, 0x8f6e76, 0x26b7965, 0x26b8c70, 0x26b760c, 0x8da973, 0x8ee97b, 0x47f021],
      }
  In [It] at: /home/runner/work/velero/velero/test/e2e/test/test.go:120 @ 07/10/25 02:37:42.696

@sseago
Copy link
Collaborator Author

sseago commented Jul 11, 2025

@kaovilai Hmm. That doesn't seem like it should be related to my change. I guess I'll update the branch (it's out of date anyway) and see what happens on retest. If it's still failig, I'll dig into it in more detail on Monday.

@sseago
Copy link
Collaborator Author

sseago commented Jul 14, 2025

e2e tests are all failing with BSL validation failures.

"BackupStorageLocation \"default\" is unavailable: rpc error: code = Unknown desc = failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, failed to get API token, operation error ec2imds: getToken, http response error StatusCode: 400, request to EC2 IMDS failed",

What I don't quite understand is that the only change made by this PR is to update file permissions for the secret to remove the world-read permission. If this caused errors in the e2e Kind cluster, I would expect to see a "permission denied" error originating in velero code, not in the AWS APIs. If we're able to make the AWS call at all, that suggests that velero was able to read the file. If velero was able to read the file, then the AWS API call should be identical with and without this change.

@blackpiglet
Copy link
Contributor

I think it's still related to the credentials permission.
The error could be caused by the AWS SDK automatically falling back to the EC2 IMDS way on failing with the provided credentials.

@sseago
Copy link
Collaborator Author

sseago commented Jul 15, 2025

@blackpiglet I need to do some extra logging and see what's going on with the actual velero UID/GID and the file permissions. It could be that the kind clusters don't work properly without everything being world-readable here, or it could be that the e2e tests are doing something weird with the uid/gid for velero causing problems. In theory the velero server should have no problems accessing files with read permission for the owner and group, since the owner and group for the file should be the same as for the velero server process.

@sseago sseago marked this pull request as draft July 15, 2025 16:31
@sseago sseago force-pushed the secret-perms branch 3 times, most recently from 4a68116 to 20b4deb Compare July 15, 2025 16:59
@sseago
Copy link
Collaborator Author

sseago commented Jul 15, 2025

After adding some logging, it looks like what's happening is that the default credentials, mounted from the secret at /credentials/cloud is showing up in the velero container as owned by uid/gid=0/0:
From my extra debug logging:
velero uid/gid 1002/1000, config map[region:minio s3ForcePathStyle:true s3Url:http://10.1.0.214:9000], perms /credentials/cloud: 0440; uid/gid: 0/0:
What's not yet clear to me is whether this is a problem with the kind cluster configuration or whether we need some additional specification in the velero deployment pod template spec to force the secret credentials mount to have the same owner as the container's runAsUser.

@blackpiglet
Copy link
Contributor

After adding some logging, it looks like what's happening is that the default credentials, mounted from the secret at /credentials/cloud is showing up in the velero container as owned by uid/gid=0/0: From my extra debug logging: velero uid/gid 1002/1000, config map[region:minio s3ForcePathStyle:true s3Url:http://10.1.0.214:9000], perms /credentials/cloud: 0440; uid/gid: 0/0: What's not yet clear to me is whether this is a problem with the kind cluster configuration or whether we need some additional specification in the velero deployment pod template spec to force the secret credentials mount to have the same owner as the container's runAsUser.

Thanks for the detailed information.
I will do some tests in my environment.
By the way, how does the ODAP install Velero, by Velero CLI or using client-go to talk to the kube-apiserver?

@blackpiglet
Copy link
Contributor

blackpiglet commented Jul 16, 2025

@sseago
The reason is that Velero CLI installation doesn't set the securityContext for the Velero Deployment and DaemonSet.
By default, the mounted secret's ownership is set as root.

As a result, adding the securityContext section to the Velero deployment and the DaemonSet should fix the issue.

The Velero server image, Ubuntu Jammy, uses the user cnb(Cloud Native Buildpack). Its UID and GID are both 1000.

I will create a PR to add the SecurityContext for the Velero install CLI and the E2E tests.

@sseago
Copy link
Collaborator Author

sseago commented Jul 16, 2025

@blackpiglet ok, so I figured out what was different between the OpenShift cluster and the Kind e2e cluster.

So for mounted secrets, there is no way to change the user ownership. It will always be root/0. However, if the pod sets fsGroup in the security context, then the mounted secret's group ownership will be modified to match. OpenShift sets fsGroup automatically when launching pods if it's not set explicitly. So it was actually an OpenShift quirk that allowed this to work by just setting defaultMode. I don't think we have an issue with the restricted permissions for named non-default credentials, since those files are created by the velero server process, so the uid/gids on those files should always match whatever user is running the pod.

One possible fix is this:

  1. Add an optional --fsgroup flag to the velero installer
  2. If --fsgroup is specified, then add fsgroup to the security context of the velero deployment pod spec template and set the mounted credentials secret mode to 0440
  3. If --fsgroup is not specified, then just set the mode to 0444 (readonly, but world readable)
  4. Leave the named credentials change in this PR as-is, always making that one 0640 (read-write, but not world-readable)

Another option would be to skip the --fsgroup addition and just always set the mounted default credentials to mode 0444.

@blackpiglet
Copy link
Contributor

@sseago
Thanks for investigating this issue.

Are there any security hardening rules required to make the secret non-world-readable?

If we go the first way, there would be a potential need to modify the fsGroup value when the Velero server image changes. It would be the same if we choose to use a default value.
The stricter limitation may increase the knowledge demands for the users.

If there is no clear requirement for that, I prefer to go the second way.

@sseago
Copy link
Collaborator Author

sseago commented Jul 17, 2025

@blackpiglet I wasn't suggesting that we hard-code any fsGroup values but making it an optional parameter. If the user doesn't specify fsGroup, then we don't add the security context, and we can just mount it read-only as 444. If the user supplies fsGroup at install, then we set it in the security context and then the velero server will run with that group set.

But we could go with the simpler second approach. @weshayutin do you know whether the OADP request for modifying these permissions was related to any specific security hardening rule, or was it just a general sense that we shouldn't have world-readable secrets? I think the practical impact here is relatively small, since in general, users who aren't authorized to access the credentials shouldn't be able to run processes in the velero pod anyway.

@blackpiglet
Copy link
Contributor

@blackpiglet I wasn't suggesting that we hard-code any fsGroup values but making it an optional parameter. If the user doesn't specify fsGroup, then we don't add the security context, and we can just mount it read-only as 444. If the user supplies fsGroup at install, then we set it in the security context and then the velero server will run with that group set.

But we could go with the simpler second approach. @weshayutin do you know whether the OADP request for modifying these permissions was related to any specific security hardening rule, or was it just a general sense that we shouldn't have world-readable secrets? I think the practical impact here is relatively small, since in general, users who aren't authorized to access the credentials shouldn't be able to run processes in the velero pod anyway.

That should work.
If there is a need from the OADP side, we can go that way.

@sseago sseago marked this pull request as ready for review July 17, 2025 16:28
@github-actions github-actions bot requested review from kaovilai and ywk253100 July 17, 2025 16:28
@sseago
Copy link
Collaborator Author

sseago commented Jul 17, 2025

@blackpiglet Updated with option 2: default creds mounted 444 (read-only), non-default cred files created 640 (no world-read).

@blackpiglet blackpiglet merged commit 29a8bc4 into vmware-tanzu:main Jul 18, 2025
47 checks passed
sseago added a commit to sseago/velero that referenced this pull request Jul 18, 2025
sseago added a commit to sseago/velero that referenced this pull request Jul 18, 2025
sseago added a commit to sseago/velero that referenced this pull request Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mounted cloud credentials secret files should have more restrictive file access permissions
5 participants